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METHOD FOR ASSEMBLY OF A POLYNUCLEOTIDE 
ENCODING A TARGET POLYPEPTIDE 



BACKGROUND OF THR INVENTION 



The present invention relates generally to the 
area of bioinf ormatics and more specifically to methods, 
algorithms and apparatus for computer directed 
polynucleotide assembly. The invention further relates 
to the production of polypeptides encoded by 
polynucleotides assembled by the invention. 

Enzymes, antibodies, receptors and ligands are 
polypeptides that have evolved by selective pressure to 
perform very specific biological functions within the 
milieu of a living organism. The use of a polypeptide 
for specific technological applications may require the 
polypeptide to function in environments or on substrates 
for which it was not evolutionarily selected. 
Polypeptides isolated from microorganisms that thrive in 
extreme environments provide ample evidence that these 
molecules are, in general, malleable with regard to 
structure and function. However, the process for 
isolating a polypeptide from its native environment is 
expensive and time consuming. Thus, new methods for 
synthetically evolving genetic material encoding a 
polypeptide possessing a desired activity are needed. 

There are two ways to obtain genetic material 
for genetic engineering manipulations: (1) isolation and 
purification of a polynucleotide in the form of DNA or 
RNA from natural sources or (2) the synthesis of a 
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polynucleotide using various chemical-enzymatic 
approaches. The former approach is limited to naturally- 
occurring sequences that do not easily lend themselves to 
specific modification. The latter approach is much more 
complicated and labor-intensive. However, the chemical- 
enzymatic approach has many attractive features including 
the possibility of preparing, without any significant 
limitations, any desirable polynucleotide sequence. 

Two general methods currently exist for the 
synthetic assembly of oligonucleotides into long 
polynucleotide fragments. First, oligonucleotides 
covering the entire sequence to be synthesized are first 
allowed to anneal, and then the nicks are repaired with 
ligase. The fragment is then cloned directly, or cloned 
after amplification by the polymerase chain reaction 
(PCR) . The polynucleotide is subsequently used for in 
vitro assembly into longer sequences. The second general 
method for gene synthesis utilizes polymerase to fill in 
single-stranded gaps in the annealed pairs of 
oligonucleotides. After the polymerase reaction, single- 
stranded regions of oligonucleotides become double- 
stranded, and after digestion with restriction 
endonuclease, can be cloned directly or used for further 
assembly of longer sequences by ligating different 
double-stranded fragments. Typically, subsequent to the 
polymerase reaction, each segment must be cloned which 
significantly delays the synthesis of long DNA fragments 
and greatly decreases the efficiency of this approach. 

The creation of entirely novel polynucleotides, 
or the substantial modification of existing 
polynucleotides, is extremely time consuming, expensive, 



requires complex and multiple steps, and in some cases is 
impossible. Therefore, there exists a great need for an 
efficient means to assemble synthetic polynucleotides of 
any desired sequence. Such a method could be universally 
5 applied. For example, the method could be used to 
efficiently make an array of polynucleotides having 
specific substitutions in a known sequence that is 
expressed and screened for improved function. The 
present invention satisfies these needs by providing 
10 efficient and powerful methods and compositions for the 
synthesis of a target polynucleotide encoding a target 
polypeptide. 

SUMMARY OF THE INVENTION 

The present invention provides methods for the 
15 synthetic assembly of polynucleotides and related 
algorithms. In particular, the present invention 
provides fast and efficient methods for generating any 
nucleic acid sequence, including entire genes, 
chromosomal segments, chromosomes and genomes. Because 
20 this approach is based on a completely synthetic 
approach, there are no limitations, such as the 
availability of existing nucleic acids, to hinder the 
construction of even very large segments of nucleic acid. 

BRIEF DESCRIPTION OF THE DRAWINGS 

25 Like reference symbols in the various drawings 

indicate like elements. 

Figure 1 depicts 96 well plates for of F (i.e., 

"forward" or "plus strand") oligonucleotide synthesis, R 
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(i.e., "reverse" or "minus strand") oligonucleotide 
synthesis, and a T (i.e., "temperature") plate for the 
annealing of F and T oligonucleotides. 



Figure 2 depicts the oligonucleotide pooling 
5 plan where F oligonucleotides and R oligonucleotides are 
annealed to form a contiguous polynucleotide. 

Figure 3 depicts the schematic of assembly of a 
target polynucleotide sequence defining a gene, genome, 
set of genes or polypeptide sequence. The sequence is 
10 designed by computer and used to generate a set of parsed 
oligonucleotide fragments covering the + and - strand of 
a target polynucleotide sequence encoding a target 
polypeptide. 



Figure 4 depicts a schematic of the 
15 polynuceotide synthesis modules. A nanodispensing head 
with a plurality of valves will deposit synthesis 
chemicals in assembly vessels. Chemical distribution 
from the reagent reservoir can be controlled using a 
syringe pump. Underlying the reaction chambers is a set 
20 of assembly vessels linked to microchannels that will 
move fluids by microf luidics . 



Figure 5 depicts that oligonucleotide 
synthesis, oligonucleotide assembly by pooling and 
annealing, and ligation can be accomplished using 
25 microf luidic mixing. 



Figure 6 depicts the sequential pooling of 
oligonucleotides synthesized in arrays. 
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Figure 7 depicts the pooling stage of the 
oligonucleotide components through the manifold 
assemblies resulting in the complete assembly of all 
oligonucleotides from the array. 

5 Figure 8 depicts an example of an assembly 

module comprising a complete set of pooling manifolds 
produced using microf abrication in a single unit. 
Various configurations of the pooling manifold will allow 
assembly of increased numbers of well arrays of parsed 
10 component oligonucleotides. 



Figure 9 depicts the configuration for the 
assembly of oligonucleotides synthesized in a pre-defined 
array. Passage through the assembly device in the 
presence of DNA ligase and other appropriate buffer and 
15 chemical components will facilitate double stranded 
polynucleotide assembly. 

Figure 10 depicts an example of the pooling 
device design. Microgrooves or microf luidic channels are 
etched into the surface of the pooling device. The 

20 device provides a microreaction vessel at the junction of 
two channels for 1) mixing of the two streams, 2) 
controlled temperature maintenance or cycling a the site 
of the junction and 3) expulsion of the ligated mixture 
from the exit channel into the next set of pooling and 

25 ligation chambers. 

Figure 11 depicts the design of a 
polynucleotide synthesis platform comprising microwell 
plates addressed with a plurality of channels for 
microdispensing . 
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Figure 12 depicts an example of a high capacity 
polynucleotide synthesis platform using high density 
microwell microplates capable of synthesizing in excess 
of 1536 component oligonucleotides per plate. 

5 Figure 13 depicts a polynucleotide assembly 

format using surface-bound oligonucleotide synthesis 
rather than soluble synthesis. In this configuration, 
oligonucleotides are synthesized with a linker that 
allows attachment to a solid support. 

Figure 14 depicts a diagram of systematic 
polynucleotide assembly on a solid support. A set of 
parsed component oligonucleotides are arranged in an 
array with a stabilizer oligonucletoide attached. A set 
of ligation substrate oligonucleotides are placed in the 
solution and systematic assembly is carried out in the 
solid phase by sequential annealing, ligation and 
melting . 

Figure 15 depicts polynucleotide assembly using 
^ component oligonucleotides bound to a set of metal 

20 electrodes on a microelectronic chip. Each electrode can 
be controlled independently with respect to current and 
voltage. 

Figure 16 depicts generally a primer extension 
assembly method of the invention. 
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15 



25 Figure 17 provides a system diagram of the 

invention. 



W--4k 
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Figure 18 depicts a perspective view of an 
instrument of the invention. 

Figure 19 depicts two flow-charts showing the 
generation of self -assembling oligonucleotide arrays 

5 DETAILED DESCRIPTION OF THE INVENTION 

The elucidation of the complete sequence of 
complex genomes, including the human genome, allows for 
large scale functional approaches to genetics. The 
present invention provides a novel approach to utilizing 

10 the results of genomic sequence information by computer- 
directed polynucleotide assembly based upon information 
available in databases such as the human genome database. 
Specifically, the present invention can be used to 
synthesize, assemble and select a novel, synthetic target 

15 polynucleotide sequence encoding a target polypeptide. 

The target polynucleotide can encode a target polypeptide 
that exhibits enhanced or altered biological activity as 
compared to a model polypeptide encoded by a natural 
(wild-type) or model polynucleotide sequence. 

20 Subsequently, standard assays can be used to survey the 
activity of an expressed target polypeptide. For 
example, the expressed target polypeptide can be assayed 
to determine its ability to carry out the function of the 
corresponding model polypeptide or to determine whether a 

25 target polypeptide exhibiting a new function has been 
produced. Thus, the present invention provides a means 
to direct the synthetic evolution of a model polypeptide 
by computer-directed synthesis of a polynucleotide 
encoding a target polypeptide derived from a model 

30 polypeptide . 
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In one embodiment, the invention provides a 
method of synthesizing a target polynucleotide by 
providing a target polynucleotide sequence and 
identifying at least one initiating oligonucleotide 
5 present in the target polynucleotide which includes at 
least one plus strand oligonucleotide annealed to at 
least one minus strand oligonucleotide resulting in a 
partially double-stranded polynucleotide comprised of a 
5' overhang and a 3' overhang. Subsequently, a next most 

10 terminal oligonucleotide can be added in a process that 
is repeated systematically to sequentially assemble a 
double-stranded polynucleotide. In the various 
embodiments provided by the assembly methods of the 
invention, a next most terminal oligonucleotide, which 

15 can be either single-stranded or double-stranded, can be 
added so as to extend the initiating oligonucleotide in 
an alternating bi-directional manner, in a uni- 
directional manner, or any combination thereof . 

As used herein, a "target polynucleotide 
20 sequence" includes any nucleic acid sequence suitable for 
encoding a target polypeptide that can be synthesized by 
a method of the invention. A target polynucleotide 
sequence can be used to generate a target polynucleotide 
using an apparatus capable of assembling nucleic 
25 sequences. Generally, a target polynucleotide sequence 
is a linear segment of DNA having a double-stranded 
region; the segment can be of any length sufficiently 
long to be created by the hybridization of at least two 
oligonucleotides have complementary regions. It is 
30 contemplated that a target polynucleotide can be 100, 

200, 300, 400, 800, 1000, 1500, 2000, 4000, 8000, 10000, 
12000, 18,000, 20,000, 40,000, 80,000 or more base pairs 



ill 



in length. The methods of the present invention can be 
utilized to create entire artificial genomes of lengths 
comparable to known bacterial, yeast, viral, mammalian, 
amphibian, reptilian, or avian genomes. In more 
5 particular embodiments, the target polynucleotide is a 
gene encoding a polypeptide of interest. The target 
polynucleotide can further include non-coding elements 
such as origins of replication, telomeres, promoters, 
enhancers, transcription and translation start and stop 
10 signals, introns, exon splice sites, chromatin scaffold 
components and other regulatory sequences. The target 
polynucleotide can comprise multiple genes, chromosomal 

segments, chromosomes and even entire genomes. A 

-j 

2 polynucleotide of the invention can be derived from 

t 15 prokaryotic or eukaryotic sequences including bacterial, 

j yeast, viral, mammalian, amphibian, reptilian, avian, 

plants, archebacteria and other DNA containing living 
organisms. 



An "oligonucleotide", as used herein, is 
20 defined as a molecule comprised of two or more 

deoxyribonucleotides or ribonucleotides, preferably more 
than three. Oligonucleotides are small DNA segments, 
single-stranded or double-stranded, comprised of the 
nucleotide bases linked through phosphate bonds. The 
25 exact size of an oligonucleotide depends on many factors, 
such as the reaction temperature, salt concentration, the 
presence of denaturants such as formamide, and the degree 
of complementarity with the sequence to which the 
oligonucleotide is intended to hybridize. 



30 Nucleotides are present in either DNA or RNA 

and encompass adenine, cytosine, guanine and thymine or 
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uracil, respectively, as base, and a sugar moiety being 
deoxyribose or ribose, respectively. It will be 
appreciated however that other modified bases capable of 
base pairing with one of the conventional bases, adenine, 
5 cytosine, guanine, thymine and uracil, can be used in an 
oligonucleotide employed in the present invention. Such 
modified bases include for example 8-azaguanine and 
hypoxanthine - If desired the nucleotides can carry a 
label or marker so that on incorporation into a primer 
10 extension product, they augment the signal associated 
with the primer extension product, for example for 
capture on to solid phase. 



A plus strand oligonucleotide, by convention, 
includes a short, single-stranded DNA segment that starts 

15 with the 5 f end to the left as one reads the sequence. A 
minus strand oligonucleotide includes a short, single- 
stranded DNA segment that starts with the 3 T end to the 
left as one reads the sequence. Methods of synthesizing 
oligonucleotides are found in, for example, 

20 Oligonucleotide Synthesis: A Practical Approach , Gate, 
ed., IRL Press, Oxford (1984), incorporated herein by 
reference in its entirety. Solid-phase synthesis 
techniques have been provided for the synthesis of 
several peptide sequences on, for example, a number of 

25 "pins" (See e.g., Geysen et al., J. Immun. Meth . (1987) 
102:259-274, incorporated herein by reference in its 
entirety) . 



Additional methods of forming large arrays of 
oligonucleotides and other polymer sequences in a short 
30 period of time have been devised. Of particular note, 
Pirrung et al., U . S . Pat . No. 5,143,854 (see also PCT 
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Application No. WO 90/15070), Focior et al., PCT 
Publication No. WO 92/10092 and Winkler et al., U.S. Pat 
No. 6,136,269, all incorporated herein by reference, 
disclose methods of forming vast arrays of polymer 
5 sequences using, for example, light-directed synthesis 
techniques. See also, 'Fodor et al., Science (1991) 
251:767-777, also incorporated herein by reference in its 
entirety. Some work has been done to automate synthesis 
of polymer arrays. For example, Southern, PCT 
10 Application No. WO 89/10977, describes the use of a 
conventional pen plotter to deposit three different 
monomers at twelve distinct locations on a substrate. 

An "initiating" oligonucleotide or 
polynucleotide sequence, as used herein, is an 

15 oligonucleotide or polynucleotide sequence that serves as 
the first or starting sequence that is sequentially 
extended by systematic addition of a next most terminal 
oligonucleotides or a next most terminal component 
polynucleotide. An intiating oligonucleotide or 

20 polynucleotide sequence can have a 5' overhang, a 3' 

overhang, or a 5 T and a 3 ? overhang of either strand. An 
intiating oligonucleotide or polynucleotide sequence can 
be extended in an alternating bi-directional manner, in a 
uni-directional manner or any combination thereof. An 

25 initiating oligonucleotide or polynucleotide sequence can 
be contained in a target polynucleotide sequence and 
identified by an algorithm of the invention. In this 
regard, an initiating oligonucleotide or polynucleotide 
sequence contained in a target polynucleotide sequence 

30 can be either the 5 1 most terminal oligonucleotide, the 
3 1 most terminal oligonucleotide, or neither the 3 1 nor 
the 5 ! most terminal nucleotide of the target 
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polynucleotide sequence, depending on whether the target 
polynucleotide is assembled starting from the middle 
versus starting from one of the two ends. If an 
initiating oligonucleotide or polynucleotide sequence 
5 contained in a target polynucleotide sequence represents 
either the 5 ! most terminal oligonucleotide, the 3 f most 
terminal oligonucleotide of the target polynucleotide, it 
can encompass one overhang. 

For ligation assembly of a target 
polynucleotide, an initiating oligonucleotide begins 
assembly by providing an anchor for hybridization of 
subsequent oligonucleotides contiguous with the 
initiating oligonucleotide. Thus, for ligation assembly, 
an initiating oligonucleotide is partially double- 
stranded nucleic acid thereby providing single-stranded 
overhang (s) for annealing of a contiguous, double- 
stranded nucleic acid molecule. For primer extension 
assembly of a target polynucleotide, an initiating 
oligonucleotide begins assembly by providing a template 
for hybridization of subsequent oligonucleotides 
contiguous with the initiating oligonucleotide. Thus, 
for primer extension assembly, an initiating 
oligonucleotide can be partially double-stranded or fully 
double- stranded . 

25 As used herein, the term "next most terminal" 

oligonucleotide refers to an oligonucleotide that is 
added to an extended intiating oligonucleotide at either 
the 5 1 or the 3' end. A next most terminal 
oligonucleotide can be either single-stranded, partially 

30 double-stranded or fully double-stranded. In the 

sequential methods of the invention utilizing cycles of 
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10 



15 



20 
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sequentially adding the next most terminal 
oligonucleotide to the extending double-stranded 
oligonucleotide, the next most terminal oligonucleotide 
has at least one overhang that is complementary to a 3 1 
5 or 5 1 overhang sequence belonging to either the plus or 
minus strand of the extending double-stranded 
oligonucleotide . 

As used herein, the terms "5' most terminal" 
and most terminal" refer to a single-stranded or 

10 double-stranded oligonucleotide or polynucleotide that 

encompasses either the physical beginning or the end of a 
target polynucleotide sequence. As described above, an 
initiating oligonucleotide or polynucleotide used in the 
sequential assembly methods of the invention can, for 

15 example, be a 5 1 most terminal or a 3 f most terminal 
oligonucleotide . 

As used herein, the term "enzymatic synthesis" 
refers to assembly of polynucleotides that utilizes one 
or more enzymes for functions including, for example, 

20 polymerization, primer extension, ligation or mismatch 

repair. As described herein, the polynucleotide assembly 
methods of the invention can be performed both by both 
enzymatic synthesis and non-enzymatic synthesis. 
Enzymatic primer extension refers to polynucleotide 

25 synthesis methods that include primer extension via an 
enzymatic reaction including, for example, polymerase 
chain reaction (PCR) and ligase chain reaction (LCR) , 
which utilize thermostable polymerase and thermostable 
ligase, respectively, to synthesize polynucleotides. 

30 Furthermore, as used herein "enzymatic polymerization" 

refers to assembly of a polynucleotide or oligonucleotide 
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that utitilizes a natural or recombinant polymerase for 
extension including, for example, polymerase chain 
reaction (PCR) . 



The present invention provides fast and 
5 efficient methods for assembly of a polynucleotide, 
including entire genes, chromosomal segments or 
fragments, chromosomes and genomes. Because the 
invention methods are based on a completely synthetic 
approach, there are no limitations, such as the 
10 availability of existing nucleic acids or the 

complexities of site-specific mutagenesis, to hinder the 
construction of even very large segments of nucleic acid. 
In particular, art-known methods for the synthetic 
assembly of oligonucleotides into long DNA fragments 
ril 15 generally utilize polymerase to fill in single-stranded 

;^ gaps in annealed pairs of oligonucleotides. However, 

=p after the polymerase reaction, each segment must be 

; ■ ■ ;{ 

5p cloned, a step which significantly delays the synthesis 

•■3 of long polynucleotide fragments and greatly decreases 

^5 20 the efficiency of the approach. Additionally, the 

approach can be used only for small DNA fragments . 



Other art-known methods of polynucleotide 
synthesis include PCR based techniques that involve 
assembly of overlapping oligonucleotides performed by a 

25 thermostable DNA polymerase during repeated cycles by 
melting, annealing and polymerization. A key 
disadvantage of PCR mediated methods is that complex 
mispriming events negatively affect the correctness of a 
resulting assembled polynucleotide. In addition, the low 

30 fidelity of thermostable DNA polymerase influences the 
reliability of this technology with increased number of 
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PCR steps. Other known methods for polynucleotide 
synthesis involve the ligation of two or more 
polynucleotide strands without use of a template and have 
the disadvantage of only being able to synthesize short 
5 genes of about 200 base pairs. 

In one embodiment, the invention provides a 
method of assembling a double-stranded polynucleotide 
comprising a) selecting a partially double-stranded 
initiating oligonucleotide, wherein the initiating 

10 oligonucleotide comprises at least one overhang; b) 
contacting thepartially double-stranded initiating 
oligonucleotide with a next most terminal 
oligonucleotide, wherein thenext most terminal 
oligonucleotide is contiguous with the initiating 

15 oligonucleotide and comprises at least one overhang, and 
wherein the at least one overhang of thenext most 
terminal oligonucleotide is complementary to at least one 
overhang of theinitiating oligonucleotide; and c) 
repeating (b) to sequentially add the next most terminal 

20 oligonucleotide to the extended initiating 
oligonucleotide, whereby thedouble-stranded 
polynucleotide is synthesized. 

In another embodiment, the invention provides a 
method of synthesizing a target polynucleotide sequence 

25 comprising: a) providing a target polynucleotide 
sequence; b) identifying at least one initiating 
polynucleotide present in the target polynucleotide which 
includes at least one plus strand oligonucleotide 
annealed to at least one minus strand oligonucleotide 

30 resulting in a partially double-stranded polynucleotide 
comprised of a 5' overhang and a 3' overhang; c) 
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identifying a second polynucleotide present in the target 
polynucleotide which is contiguous with the initiating 
polynucleotide and includes at least one plus strand 
oligonucleotide annealed to at least one minus strand 
5 oligonucleotide resulting in a partially double-stranded 
polynucleotide comprised of a 5' overhang, a 3' overhang, 
or a 5' overhang and a 3' overhang, where at least one 
overhang of the second polynucleotide is complementary to 
at least one overhang of the initiating polynucleotide; 
10 d) identifying a third polynucleotide present in the 
target polynucleotide which is contiguous with the 
initiating sequence and includes at least one plus strand 
□ oligonucleotide annealed to at least one minus strand 

; jg oligonucleotide resulting in a partially double-stranded 

|}f 15 polynucleotide comprised of a 5' overhang, a 3' overhang, 

ffj or a 5' overhang and a 3' overhang, where at least one 

; y overhang of the third polynucleotide is complementary to 

n at least one overhang of the initiating polynucleotide 

y which is not complementary to an overhang of the second 

□■ 20 polynucleotide; e) contacting the initiating 

;i; polynucleotide with the second polynucleotide and the 

M third polynucleotide under conditions and for such time 

suitable for annealing, the contacting resulting in a 
contiguous double-stranded polynucleotide, resulting in 
25 the bi-directional extension of the initiating 

polynucleotide; f) in the absence of primer extension, 
optionally contacting the mixture of e) with a ligase 
under conditions suitable for ligation; and g) optionally 
repeating (b) through .(f) to sequentially add double- 
30 stranded polynucleotides to the extended initiating 

polynucleotide through repeated cycles of annealing and 
ligation, whereby a target polynucleotide is synthesized. 
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The invention further provides a method of 
assembling a target polynucleotide comprising: a) 
providing a target polynucleotide sequence; 
b) identifying at least one partially double-stranded 
5 initiating oligonucleotide present in the target 

polynucleotide, wherein the initiating oligonucleotide 
comprises a 5' overhang and a 3' overhang; c) identifying 
a next most terminal oligonucleotide present in the 
target polynucleotide, wherein the next most terminal 
10 oligonucleotide is contiguous with the initiating 

oligonucleotide and comprises a 5' overhang and a 3' 
overhang, wherein at least one overhang of the next most 
terminal oligonucleotide is complementary to at least one 
overhang of the initiating oligonucleotide; d) contacting 
[U 15 the initiating oligonucleotide with the next most 

"II terminal oligonucleotide under such conditions and for 

such time suitable for annealing, wherein the initiating 

lasts 

;a sequence is extended; and e) optionally repeating (a) 

^jf through (d) to sequentially add the next most terminal 

:;5 20 oligonucleotide to the extended initiating 

oligonucleotide, whereby a target polynucleotide is 
k&. synthesized. 



The invention also provides a method of 
assembling a polynucleotide comprising : a) providing a 

25 partially double-stranded initiating oligonucleotide 

present, wherein the initiating oligonucleotide comprises 
a 5' overhang and a 3' overhang; c) identifying a next 
most terminal oligonucleotide, wherein the next most 
terminal oligonucleotide is contiguous with the 

30 initiating oligonucleotide and comprises a 5' overhang 
and a 3' overhang, wherein at least one overhang of the 
next most terminal oligonucleotide is complementary to at 
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least one overhang of the initiating oligonucleotide; d) 
contacting the initiating oligonucleotide with the next 
most terminal oligonucleotide under such conditions and 
for such time suitable for annealing, wherein the 
initiating sequence is extended; and e) optionally 
repeating (a) through (d) to sequentially add the next 
most terminal oligonucleotides to the extended initiating 
oligonucleotide, whereby a polynucleotide is synthesized. 

The invention further provides a method of 
synthesizing a target polynucleotide comprising: a) 
providing a target polynucleotide sequence derived from a 
model sequence; b) identifying at least one initiating 
polynucleotide sequence present in the target 
polynucleotide sequence of a) f wherein the initiating 
polynucleotide contains: 1) a first plus strand 
oligonucleotide; 2) a second plus strand oligonucleotide 
contiguous with the first plus strand oligonucleotide; 
and 3) a minus strand oligonucleotide including a first 
contiguous sequence which is at least partially 
complementary to the first plus strand oligonucleotide 
and second contiguous sequence which is at least 
partially complementary to the second plus strand 
oligonucleotide; c) annealing the first plus strand 
oligonucleotide and the second plus strand 
oligonucleotide to the minus strand oligonucleotide of b) 
resulting in a partially double-stranded initiating 
polynucleotide including a 5' overhang and a 3' overhang; 
d) identifying a second polynucleotide sequence present 
in the target polynucleotide sequence of a) , wherein the 
second polynucleotide sequence is contiguous with the 
initiating polynucleotide sequence and contains: 1) a 
first plus strand oligonucleotide; 2) a second plus 



19 

strand oligonucleotide contiguous with the first plus 
strand oligonucleotide; and 3) a minus strand 
oligonucleotide comprising a first contiguous sequence 
which is at least partially complementary to the first 
5 plus strand oligonucleotide and second contiguous 

sequence which is at least partially complementary to the 
second plus strand oligonucleotide; e) annealing the 
first plus strand oligonucleotide and the second plus 
strand oligonucleotide to the minus strand 
10 oligonucleotide of d) resulting in a partially double- 
stranded second polynucleotide, wherein at least one 
overhang of the second polynucleotide is complementary to 

\:f at least one overhang of the initiating polynucleotide; 

= i; f| f ) identifying a third polynucleotide present in the 

L:H 15 target polynucleotide of a) , wherein the third 

i'U 

111 polynucleotide is contiguous with the initiating sequence 

and contains: 1) a first plus strand oligonucleotide; 2) 
a second plus strand oligonucleotide contiguous with the 
first plus strand oligonucleotide; and 3) a minus strand 
20 oligonucleotide comprising a first contiguous sequence 
which is at least partially complementary to the first 
plus strand oligonucleotide and second contiguous 
sequence which is at least partially complementary to the 
second plus strand oligonucleotide; g) annealing the 
25 first plus strand oligonucleotide and the second plus 
strand oligonucleotide to the minus strand 
oligonucleotide of f) resulting in a partially double- 
stranded second polynucleotide, wherein at least one 
overhang of the third polynucleotide is complementary to 
30 at least one overhang of the initiating polynucleotide 
and not complementary to an overhang of the second 
polynucleotide; h) contacting the initiating 
polynucleotide of c) with the second polynucleotide of e) 
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and the third polynucleotide of g) under conditions and 
for such time suitable for annealing, the contacting 
resulting in a contiguous double-stranded polynucleotide, 
wherein the initiating sequence is extended bi- 
5 directionally ; i) in the absence of primer extension, 
optionally contacting the mixture of h) with a ligase 
under conditions suitable for ligation; and j) optionally 
repeating b) through i) to sequentially add double- 
stranded polynucleotides to the extended initiating 
10 polynucleotide through repeated cycles of annealing and 
ligation, whereby a target polynucleotide is synthesized. 

The invention further provides a method of 
synthesizing a target polynucleotide comprising: a) 
providing a target polynucleotide sequence derived from a 

15 model sequence; b) identifying at least one initiating 
polynucleotide sequence present in the target 
polynucleotide sequence of a) , wherein the initiating 
polynucleotide includes 1) a first plus strand 
oligonucleotide; 2) a second plus strand oligonucleotide 

20 contiguous with the first plus strand oligonucleotide; 
and 3) a minus strand oligonucleotide including a first 
contiguous sequence which is at least partially 
complementary to the first plus strand oligonucleotide 
and second contiguous sequence which is at least 

25 partially complementary to the second plus strand 
oligonucleotide; c) annealing the first plus strand 
oligonucleotide and the second plus strand 

oligonucleotide to the minus strand oligonucleotide of b) 
resulting in a partially double-stranded initiating 
30 polynucleotide including a 5' overhang and a 3' overhang; 
d) identifying a second polynucleotide sequence present 
in the target polynucleotide sequence of a) , wherein the 
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second polynucleotide sequence is contiguous with the 
initiating polynucleotide sequence and contains: 1) a 
first plus strand oligonucleotide; 2) a second plus 
strand oligonucleotide contiguous with the first plus 
5 strand oligonucleotide; and 3) a minus strand 

oligonucleotide comprising a first contiguous sequence 
which is at least partially complementary to the first 
plus strand oligonucleotide and second contiguous 
sequence which is at least partially complementary to the 

10 second plus strand oligonucleotide ; e) annealing the 
first plus strand oligonucleotide and the second plus 
strand oligonucleotide to the minus strand 
oligonucleotide of d) resulting in a partially double- 
stranded second polynucleotide, wherein at least one 

15 overhang of the second polynucleotide is complementary to 
at least one overhang of the initiating polynucleotide ; 
h) contacting the initiating polynucleotide of c) with 
the second polynucleotide of e ) under conditions and for 
such time suitable for annealing, the contacting 

20 resulting in a contiguous double- stranded polynucleotide, 
wherein the initiating sequence is extended; i) in the 
absence of primer extension, optionally contacting the 
mixture of h) with a ligase under conditions suitable for 
ligation; and j ) optionally repeating b) through i) to 

25 sequentially add double- stranded polynucleotides to the 
extended initiating polynucleotide through repeated 
cycles of annealing and ligation, whereby a target 
polynucleotide is synthesized. 



The invention further provides a method of 
30 synthesizing a target polynucleotide comprising a) 

providing a first polynucleotide including 1 ) a first 
plus strand oligonucleotide; 2) a second plus strand 
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oligonucleotide contiguous with the first plus strand 
oligonucleotide; and 3) a minus strand oligonucleotide 
including a first contiguous sequence which is at least 
partially complementary to the first plus strand 
5 oligonucleotide and second contiguous sequence which is 
at least partially complementary to the second plus 
strand oligonucleotide; b) annealing the first plus 
strand oligonucleotide and the second plus strand 
oligonucleotide to the minus strand oligonucleotide of a) 

10 resulting in a partially double-stranded initiating 

polynucleotide including a 5' overhang and a .3' overhang; 
c) identifying a second polynucleotide sequence present 
in the target polynucleotide sequence of a) , wherein the 
second polynucleotide sequence is contiguous with the 

15 initiating polynucleotide sequence and includes: 1) a 
first plus strand oligonucleotide; 2) a second plus 
strand oligonucleotide contiguous with the first plus 
strand oligonucleotide; and 3) a minus strand 
oligonucleotide comprising a first contiguous sequence 

20 which is at least partially complementary to the first 
plus strand oligonucleotide and second contiguous 
sequence which is at least partially complementary to the 
second plus strand oligonucleotide; e) annealing the 
first plus strand oligonucleotide and the second plus 

25 strand oligonucleotide to the minus strand 

oligonucleotide of d) resulting in a partially double- 
stranded second polynucleotide, wherein at least one 
overhang of the second polynucleotide is complementary to 
at least one overhang of the initiating polynucleotide; 

30 h) contacting the initiating polynucleotide of c) with 
the second polynucleotide of e) under conditions and for 
such time suitable for annealing, the contacting 
resulting in a contiguous double-stranded polynucleotide, 
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wherein the initiating sequence is extended; i) in the 
absence of primer extension, optionally contacting the 
mixture of h) with a ligase under conditions suitable for 
ligation; and j) optionally repeating b) through i) to 
5 sequentially add double-stranded polynucleotides to the 
extended initiating polynucleotide through repeated 
cycles of annealing and ligation, whereby a target 
polynucleotide is synthesized. 



The invention further provides a method of 
10 synthesizing a target polynucleotide comprising: a) 

providing a first plus strand oligonucleotide; b) a first 
minus strand oligonucleotide which is at least partially 
complementary to the first plus strand oligonucleotide; 
c) annealing the first plus strand oligonucleotide to the 
15 first minus strand oligonucleotide resulting in a 
partially double-stranded initiating polynucleotide 
i including at least one overhang; d) adding a next most 

;|i terminal single-stranded oligonucleotide that is at least 

*M 

3 ' partially complementary to the overhang of the double- 

H 

5 20 stranded initiating polynucleotide; e) annealing the next 

■ most terminal single-stranded oligonucleotide to the 

double-stranded initiating polynucleotide; d) resulting 
in a partially double-stranded second polynucleotide, 
including at least one overhang; h) in the absence of 

25 primer extension, optionally contacting the mixture of h) 
with a ligase under conditions suitable for ligation; and 
j) optionally repeating b) through i) to sequentially add 
single-stranded polynucleotides to the extended 
initiating polynucleotide through repeated cycles of 

30 annealing and ligation, whereby a target polynucleotide 
is synthesized. 
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In another embodiment, the invention provides a 
method for synthesizing a target polynucleotide, 
comprising: a) providing a target polynucleotide sequence 
derived from a model sequence; b) identifying at least 
5 one initiating polynucleotide present in the target 
polynucleotide which includes at least one plus strand 
oligonucleotide annealed to at least one minus strand 
oligonucleotide; c) contacting the initiating 
polynucleotide under conditions suitable for primer 

10 annealing with a first oligonucleotide having partial 
complementarity to the 3' portion of the plus strand of 
the initiating polynucleotide, and a second 
oligonucleotide having partial complementarity to the 3' 
portion of the minus strand of the initiating 

15 polynucleotide; d) catalyzing under conditions suitable 
for primer extension: 1) polynucleotide synthesis from 
the 3' -hydroxy 1 of the plus strand of the initiating 
polynucleotide; 2) polynucleotide synthesis from the 3' - 
hydroxyl of the annealed first oligonucleotide; 3) 

20 polynucleotide synthesis from the 3' -hydroxyl of the 
minus strand of the initiating polynucleotide; and 4) 
polynucleotide synthesis from the 3' -hydroxyl of the 
annealed second oligonucleotide, resulting in the bi- 
directional extension of the initiating sequence thereby 

25 forming a nascent extended initiating polynucleotide; e) 
contacting the extended initiating polynucleotide of d) 
under conditions suitable for primer annealing with a 
third oligonucleotide having partial complementarity to 
the 3' portion of the plus strand of the extended 

30 initiating polynucleotide, and a fourth oligonucleotide 
having partial complementarity to the 3' portion of the 
minus strand of the extended initiating polynucleotide; 
f) catalyzing under conditions suitable for primer 
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extension: 1) polynucleotide synthesis from the 3' - 
hydroxyl of the plus strand of the extended initiating 
polynucleotide; 2) polynucleotide synthesis from the 3'- 
hydroxyl of the annealed third oligonucleotide; 3) 
5 polynucleotide synthesis from the 3' -hydroxyl of the 
minus strand of the extended initiating polynucleotide; 
and 4) polynucleotide synthesis from the 3' -hydroxyl of 
the annealed fourth oligonucleotide, resulting in the bi- 
directional extension of the initiating sequence thereby 
10 forming a nascent extended initiating polynucleotide; and 
g) optionally repeating e) through f) as desired, 
resulting in formation of the target polynucleotide 

..anw> 

: - sequence . 

The invention further provides a method for 
isolating a target polypeptide encoded by a target 
polynucleotide generated by a method of the invention 
comprising: a) incorporating the target polynucleotide in 
an expression vector; b) introducing the expression 
vector into a suitable host cell; c) culturing the cell 
under conditions and for such time as to promote the 
expression of the target polypeptide encoded by the 
target polynucleotide; and d) isolating the target 
polypeptide . 



15 
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The invention further provides a method of 
25 synthesizing a target polynucleotide comprising: a) 

providing a target polynucleotide sequence derived from a 
model sequence; b) chemically synthesizing a plurality of 
single-stranded oligonucleotides each of which is 
partially complementary to at least one oligonucleotide 
30 present in the plurality, where the sequence of the 

plurality of oligonucleotides is a contiguous sequence of 



rll 



26 

the target polynucleotide; c) contacting the partially 
complementary oligonucleotides under conditions and for 
such time suitable for annealing, the contacting 
resulting in a plurality of partially double-stranded 
5 polynucleotides, where each double-stranded 

polynucleotide includes a 5' overhang and a 3' overhang; 
d) identifying at least one initiating polynucleotide 
derived from the model sequence present in the plurality 
of double-stranded polynucleotides; e) in the absence of 

10 primer extension, subjecting a mixture including the 
initiating polynucleotide and 1) a double-stranded 
polynucleotide that will anneal to the 5' portion of the 
initiating and sequence; 2) a double-stranded 
polynucleotide that will anneal to the 3' portion of the 

15 initiating polynucleotide; and 3) a DNA ligase under 

conditions suitable for annealing and ligation, wherein 
the initiating polynucleotide is extended bi- 
directionally; f) sequentially annealing double-stranded 
polynucleotides to the extended initiating polynucleotide 

20 through repeated cycles of annealing, whereby the target 
polynucleotide is produced. 

In addition to the sequential assembly methods 
described above, the invention also provides set assembly 
methods, in which two sets of oligonucleotides are 

25 synthesized and subsequently annealed. In this regard, 
the invention also provides a method of assembling a 
double-stranded polynucleotide, comprising: (a) 
chemically synthesizing a first set of oligonucleotides 
of at least 25 bases comprising a first strand of a 

30 double-stranded polynucleotide; (b) chemically 

synthesizing a second set of oligonucleotides of at least 
25 bases comprising a second complementary strand of the 
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double-stranded polynucleotide, each of the 
oligonucleotides within the second set of the 
oligonucleotides overlapping with at least one 
oligonucleotide within the first set of the 
5 oligonucleotides, and (c) annealing the first and second 
sets of oligonucleotides to produce a double-stranded 
polynucleotide in the absence of enzymatic synthesis. 



A double-stranded polynucleotide produced by 
the assembly methods of the invention can be, for 

10 example, about 100, 200, 300, 400, 500, 600, 700,. 800, 
900, 1 x 10 3 , 5x 10 3 , lx 10\ 5xl0\ lxlO 5 , 5xl0 5 , lxlO 6 , 
5xl0 6 , lxlO 7 , 5xl0 7 , 1x10 s , 5xl0 8 , lxlO 9 , 5xl0 9 or more base 
pairs in length. As described above, in one embodiment 
of the invention, two sets of oligonucleotides can 

15 generated such that the entire plus and minus strands of 
the gene is represented. The oligonucleotide sets can be 
comprised of oligonucleotides of between about 15 and 150 
bases, between about 20 and 100 bases, between about 25 
and 75 bases, between about 30 and 50 bases. Specific 

20 lengths include, for example, 15, 16, 17, 18, 19, 20, 21, 



25 



22, 


23, 


24, 


25, 


26, 


27, 


28, 


29, 


30, 


31, 


32, 33, 


34, 


35, 


36, 


37, 


38, 


39, 


40, 


41, 


42, 


43, 


44, 


45, 


46, 47, 


48, 


49, 


50, 


51, 


52, 


53, 


54, 


55, 


56, 


57, 


58, 


59, 


60, 61, 


62, 


63, 


64 . 


65, 


66, 


67, 


68, 


69, 


70, 


71, 


72, 


73, 


74, 75, 


76, 


77, 


78, 


79, 


80, 


81, 


82, 


83, 


84, 


85, 


86, 


87, 


88, 89, 


90, 


91, 


92, 


93, 


94, 


95, 


96, 


97, 


98, 


99, 


100, 


110 


, 120, 


130, 


150 


or 


more 


bases . 





















Depending on the size, the overlap between the 
oligonucleotides of the two sets may be designed to be 
30 about 50 percent of the length of the oligonucleotide or 
between about 5 and 75 bases per oligonucleotide pair, 
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for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 
49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 
5 63, 64. 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 80, 
90, 100 or more bases. The sets can be designed such 
that complementary pairing with the first and second sets 
results in overlap of paired sequences, as each 
oligonucleotide of the first set is complementary with 
10 regions from two oligonucleotides of the second set, with 
the possible exception of the terminal oligonucleotides. 
The first and the second sets of oligonucleotides can 
optionally be annealed in a single mixture and treated 
with a ligating enzyme. 

15 The invention further provides a method of 

assembling a double-stranded replication-competent 
polynucleotide, comprising: (a) chemically synthesizing a 
first set of oligonucleotides comprising a first strand 
of a double-stranded replication-competent polynucleotide 

20 having a coding region and a regulatory region; (b) 

chemically synthesizing a second set of oligonucleotides 
comprising a second complementary strand of the double- 
stranded replication-competent polynucleotide having a 
coding region and a regulatory region, each of the 

25 oligonucleotides within the second set of the 
oligonucleotides overlapping with at least one 
oligonucleotide within the first set of the 
oligonucleotides, and (c) annealing the first and second 
sets of oligonucleotides to produce a double-stranded 

30 replication-competent polynucleotide having a coding 
region and a regulatory region. 
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The set assembly methods of the invention can 
further Joe combined with the sequential assembly methods 
to assemble a double-stranded polynucleotide. As used 
heren, the term "component polynucleotide" when used in 
5 reference to a method of assembly that combines the set 
and sequential assembly methods provided by the 
invention, refers to a polynucleotide that is prepared by 
synthesizing and annealing of two separate sets of 
oligonucleotides. A component polynucleotide is 
10 subsequently incorporated into a larger polynucleotide 
via the sequential assembly methods provided by the 
invention. 

Thus, the invention provides a method of 
assembling a double-stranded polynucleotide comprising: 

15 a) chemically synthesizing a first set of 

oligonucleotides comprising a first strand of a double- 
stranded polynucleotide; b) chemically synthesizing a 
second set of oligonucleotides comprising a second 
complementary strand of the double-stranded 

20 polynucleotide, each of the oligonucleotides within the 
second set of the oligonucleotides overlapping with at 
least one oligonucleotide within the first set of the 
oligonucleotides; and c) annealing the first and second 
sets of oligonucleotides to produce a partially double- 

25 stranded component polynucleotide; d) repeating steps (a) 
through (c) to prepare a series of partially double- 
stranded component polynucleotides; e) selecting at least 
one partially double-stranded component polynucleotide 
present in the target polynucleotide to serve as the 

30 initiating polynucleotide, wherein the initiating 

polynucleotide comprises a 5' overhang and a 3' overhang; 
f) adding the next most terminal component 
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polynucleotide, wherein the next most terminal component 
polynucleotide comprises at least one overhang that is 
complementary to at least one overhang of the initiating 
polynucleotide; g) contacting the initiating 
5 polynucleotide with the next most terminal component 
polynucleotide under such conditions and for such time 
suitable for annealing, wherein the initiating sequence 
is extended; and h) optionally repeating (e) through (g) 
to sequentially add the next most terminal component 
10 polynucleotides to the extended initiating 

polynucleotide, whereby a target polynucleotide is 
assembled in the absence of enzymatic synthesis. 

The assembly methods of the invention can 
encompass an initial step of providing or selecting a 
15 target polynucleotide to be assembled or can be performed 
without a predetermined target to assemble a 
polynucleotide of random sequence, for example, to 
generate a random library. Alternatively, the assembly 
Si methods of the invention can be utilized to assemble a 

45; 20 polynucleotide that encompasses a target sequence, but 

also contains a random sequence, for example, to generate 
a biased library. The invention provides a computer 
program, stored on a computer-readable medium, for 
generating a target polynucleotide sequence derived from 
25 a model sequence, the computer program comprising 
instructions for causing a computer system to: a) 
identify an initiating polynucleotide sequence contained 
in the target polynucleotide sequence; b) parse the 
target polynucleotide sequence into multiply distinct, 
30 partially complementary, oligonucleotides; c) control 
assembly of the target polynucleotide sequence by 
controlling the bi-directional extension of the 
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initiating polynucleotide sequence by the sequential 
addition of partially complementary oligonucleotides 
resulting in a contiguous double-stranded polynucleotide. 

The invention further provides a method for 
5 automated synthesis of a target polynucleotide sequence, 
including: a) providing the user with an opportunity to 
communicate a desired target polynucleotide sequence; b) 
allowing the user to transmit the desired target 
polynucleotide sequence to a server; c) providing the 
10 user with a unique designation; d) obtaining the 

transmitted target polynucleotide sequence provided by 
the user. 

The invention further provides a method for 
automated synthesis of a polynucleotide sequence, 

15 including: a) providing a user with a mechanism for 
communicating a model polynucleotide sequence; b) 
optionally providing the user with an opportunity to 
communicate at least one desired modification to the 
model sequence if desired; c) allowing the user to 

20 transmit the model sequence and desired modification to a 
server; d) providing user with a unique designation; e) 
obtaining the transmitted model sequence and optional 
desired modification provided by the user; f) inputting 
into a programmed computer, through an input device, data 

25 including at least a portion of the model polynucleotide 
sequence; g) determining, using the processor, the 
sequence of the model polynucleotide sequence containing 
the desired modification; h) further determining, using 
the processor, at least one initiating polynucleotide 

30 sequence present in the model polynucleotide sequence; i) 
selecting, using the processor, a model for synthesizing 
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the modified model polynucleotide sequence based on the 
position of the initiating sequence in the model 
polynucleotide sequence; and j) outputting, to the output 
device, the results of the at least one determination. 

5 Unless otherwise defined, all technical and 

scientific terms used herein have the same meaning as 
commonly understood by one of ordinary skill in the art 
to which this invention belongs. For example, the one 
letter and three letter abbreviations for amino acids and 

10 the one-letter abbreviations for nucleotides are commonly 
understood. Although methods and materials similar or 
equivalent to those described herein can be used in the 
practice or testing of the present invention, suitable 
methods and materials are described below. In addition, 

15 the materials, methods and examples are illustrative only 
and not intended to be limiting. All publications, 
patent applications, patents, and other references 
mentioned herein are incorporated by reference in their 
entirety. In case of conflict, the present 

20 specification, including definitions, will control. 

The methods described above are collectively 
referred to as the polynucleotide assembly methods of the 
invention. The polynucleotide assembly methods of the 
invention can be performed in combination with or in the 

25 absence of enzymatic synthesis methods. Enzymatic 
synthesis methods include, for example, enzymatic 
polymerization, enzymatic ligation, enzymatic mismatch 
repair and other enzymatic functions that can be utilized 
in the polynucleotide assembly methods of the invention. 

30 As described above, in the invention methods of 

polynucleotide assembly the extended polynucleotide can 
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be contacted with a next most terminal oligonucleotide 
under conditions and for such time suitable for 
annealing, the contacting resulting in a contiguous 
double-stranded polynucleotide, wherein the initiating 
5 seguence is extended. Subsequent next most terminal 
oligonucleotides can be added sequentially to the 
extended initiating polynucleotide through repeated 
cycles of annealing and ligation, whereby a target 
polynucleotide is assembled. 



10 The polynucleotide assembly methods of the 

invention can include the addition of MutS during the 
polynucleotide assembly. MutS is a bacterial protein 
involved in DNA mismatch repair that recognizes and 
repairs numerous errors, including base mismatches, 

15 unpaired bases, and small insertion or deletion loops. 
MutS functions by binding the mismatched base pairs 
within double stranded polynucleotides and can be 
utilized in the methods of the invention to prevent 
incorporation of mismatched oligonucleotides into the 

20 extending double stranded polynucleotide. In particular, 
if two oligonucleotides anneal that have a single base 
mismatch, MutS binds to the annealed oligonucleotide and 
the mismatch position, thereby physically preventing the 
ligase enzyme to bind to and ligate adjacent 

25 oligonucleotides. As a consequence of MutS binding, 

oligonucleotides containing a mismatched base will not be 
incorporated into the extending double-stranded 
polynucleotide . Thus, the polynucleotide assembly 
methods of the invention, which include the set assembly 

30 and the sequential assembly methods described herein, can 
encompass addition to the reaction mixture of MutS during 
the, for example, the pooling or ligation steps . In the 
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methods of polynucleotide assembly encompassing two sets 
of oligonucleotides provided by the invention, annealing 
in the presence of MutS protein the first and second sets 
of oligonucleotides to assemble a double-stranded 
5 polynucleotide. In general, in the sequential or set 
assembly methods described herein, MutS can be added to 
the primary reaction mixture or pool and will be present 
in all subsequent assembly steps. 

Homologues of Escherichia coli MutS protein are 
10 found in almost every organism. In prokaryotes, MutS 
proteins originate from a single gene, while eukaryotes 
contain multiple mutS homologue (msh) genes. 
Thermostable MutS is derived from the thermophilic 
bacterium Thermus aquaticus and it has 63% identity with 
15 the E. coli MutS protein and 55% identity with the human 
homolog protein MSH2 . Thermostable MutS can bind 
mismatched oligonucleotides at up to 70° C and is 
particularly useful for practicing the claimed methods. 
Thermostable MutS is commercially available from a 
20 variety of sources, for example, Epicentre Technologies, 
Ecogen S.R.L., Madrid, Spain, and can be used according 
to manufacturer's instructions, for example, by adding 
0 . lug per 50ul of reaction mix. 

The polynucleotide assembly methods of the 
25 invention provide several advantages over prior art-known 
methods of polynucleotide synthesis. The polynucleotide 
assembly methods allow for assembly of large double- 
stranded nucleotides and eliminate the requirement for 
subsequent cloning and ligation into a vector that 
30 confers replication competence. Instead, the 

polynucleotide assembly methods of the invention enable 
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the efficient assembly of polynucleotides of a size 
sufficient to encompass regulatory regions as well as 
distant cis- and trans-acting elements necessary for 
replication. Thus, the polynucleotides assembled by the 
5 methods of the invention can contain, for example, a 
protein coding region, promoter, translational signal, 
origin of replication, regulatory elements and 
polyadenylation signal. By providing the ability to 
assemble replication competent oligonucleotides due to 
10 the feasability of assembling large molecules, the 

invention methods allow for assembly of polynucleotides 
that can be directly transferred to a host cell, for 
example, by transformation of a bacterial host, without 
intermediate cloning steps. 

15 The sequential polynucleotide assembly methods 

of the invention further reduce the error rate observed 
with methods that require hybridization of pools of large 
numbers of oligonucleotides. In addition, the sequential 
polynucleotide assembly methods of the invention can be 

20 performed with large oligonucleotides that have an 

overhang of about 50 percent of their length so as to 
result in an about 50 percent overlap upon hybridization 
with the corresponding complementary overhang of the 
extended initiating oligonucleotide. The sequential 

25 polynucleotide assembly methods of the invention 
eliminate the need for purification and allow for 
systematic assembly of identical sized double-stranded or 
single-stranded oligonucleotides. The assembly methods 
of the invention also can be performed with double- 

30 stranded or single-stranded oligonucleotides of non- 
identical sizes. In addition, the sequential 
polynucleotide assembly methods of the invention avoid 
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mismatch problems associated with small repeated and 
complementary sequences encountered in traditional 
pooling methods. 

In one embodiment, an initiating polynucleotide 
5 of the invention can be bound to a solid support for 
improved efficiency- The solid phase allows for the 
efficient separation of the assembled target 
polynucleotide from other components of the reaction. 
Different supports can be applied in the method. For 

10 example, supports can be magnetic latex beads or magnetic 
control pore glass beads that allows the desirable 
product from the reaction mixture to be magnetically 
separated. Binding the initiating polynucleotide to such 
beads can be accomplished by a variety of known methods, 

15 for example carbodiimide treatment (Gilham, Biochemistry 
7:2809-2813 (1968); Mizutani and , Tachbana, J. 
Chromatography 356:202-205 (1986); Wolf et al., Nucleic 
Acids Res. 15:2911-2926 (1987); Musso, Nucleic Acids Res. 
15:5353-5372 (1987); Lund et al.. Nucleic Acids Res. 

20 16:10861-10880 (1988)). 

The initiating polynucleotide attached to the 
solid phase can act as an anchor for the continued 
synthesis of the target polynucleotide. Assembly can be 
accomplished by addition of contiguous polynucleotides 

25 together with ligase for ligation assembly or by addition 
of oligonucleotides together with polymerase for primer 
extension assembly. After the appropriate incubation 
time, unbound components of the method can be washed out 
and the reaction can be repeated again to improve the 

30 efficiency of template utilization. Alternatively, 
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another set of polynucleotides or oligonucleotides can be 
added to continue the assembly. 

Solid phase, to be efficiently used for the 
synthesis, can contain pores with sufficient room for 
5 synthesis of. the long nucleic acid molecules. The solid 
phase can be composed of material that cannot non- 
specifically bind any undesired components of the 
reaction. One way to solve the problem is to use control 
pore glass beads appropriate for long DNA molecules. The 
10 initiating polynucleotide can be attached to the beads 
through a long connector. The role of the connector is 
to position the initiating polynucleotide from the 
surface of the solid support at a desirable distance. 

The method of the invention further includes 

15 identifying a next most terminal oligonucleotide present 
in the target polynucleotide, which is, contiguous with 
the initiating polynucleotide. A next most terminal 
oligonucleotide can include at least one plus strand 
oligonucleotide annealed to at least one minus strand 

20 oligonucleotide resulting in a partially double-stranded 
oligonucleotide nucleotide comprising a 5' overhang, a 3' 
overhang, or a 5' overhang and a 3' overhang, where at 
least one overhang of the next most terminal 
oligonucleotide is complementary to at least one overhang 

25 of the extended initiating polynucleotide. 

Alternatively, a next most terminal oligonucleotide can 
be single-stranded and include a region, referred to as 
an overhang herein, complementary to at least one 
overhang of the extended initiating polypeptide. Two or 

30 more oligonucleotides having complementary regions, where 
they are permitted, will "anneal" (i.e., base pair) under 
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the appropriate conditions, thereby producing a double- 
stranded region. In order to anneal (i.e., hybridize), 
oligonucleotides must be at least partially 
complementary. The term "complementary to" is used 
5 herein in relation to nucleotides to mean a nucleotide 
that will base pair with another specific nucleotide. 
Thus adenosine triphosphate is complementary to uridine 
triphosphate or thymidine triphosphate and guanosine 
triphosphate is complementary to cytidine triphosphate. 

As used herein, a 5' or 3' "overhang" means a 
single-stranded region on the 5' or 3', or 5' and 3', end 
of a double-stranded or single-stranded polynucleotide or 
of a double-stranded or single-stranded oligonucleotide 
that provides a means for the subsequent annealing of a 
contiguous polynucleotide or oligonucleotide containing 
an overhang that is complementary to the overhang of the 
contiguous polynucleotide or oligonucleotide. Depending 
on the application envisioned, one will desire to employ 
varying conditions of annealing to achieve varying 
degrees of annealing selectivity. 

For applications requiring high selectivity, 
one typically will desire to employ relatively stringent 
conditions to form the hybrids, e.g., one will select 
relatively low salt and/or high temperature conditions, 

25 such as provided by about 0.02 M to about 0.10 M NaCl at 
temperatures of about 50°C to about 70°C. Such high 
stringency conditions tolerate little, if any, mismatch 
between the oligonucleotide and the template or target 
strand. It generally is appreciated that conditions can 

30 be rendered more stringent by the addition of increasing 
amounts of formamide. 
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For certain applications, for example, by 
analogy to substitution of nucleotides by site-directed 
mutagenesis, it is appreciated that lower stringency 
conditions can be used. Under these conditions, 
5 hybridization can occur even though the sequences of 

probe and target strand are not perfectly complementary, 
but are mismatched at one or more positions. Conditions 
can be rendered less stringent by increasing salt 
concentration and decreasing temperature. For example, a 

10 medium stringency condition could be provided by about 

0.1 to 0.25 M NaCl at temperatures of about 37°C to about 
55°C, while a low stringency condition could be provided 
by about 0.15 M to about 0.9 M salt, at temperatures 
ranging from about 20°C to about 55°C. Thus, 

15 hybridization conditions can be readily manipulated 
depending on the .desired results. 

In certain embodiments, it will be advantageous 
to determine the hybridization of oligonucleotides by 
employing a label. A wide variety of appropriate labels 

20 are known in the art, including fluorescent, radioactive, 
enzymatic or other ligands, such as avidin/biotin, which 
are capable of being detected. In preferred embodiments, 
one can desire to employ a fluorescent label or an enzyme 
tag such as urease, alkaline phosphatase or peroxidase, 

25 instead of radioactive or other environmentally 

undesirable reagents. In the case of enzyme tags, 
colorimetric indicator substrates are known that can be 
employed to provide a means for detection visible to the 
human eye or spectrophotometrically to identify whether 

30 specific hybridization with complementary oligonucleotide 
has occurred. 
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In embodiments involving a solid phase, for 
example, at least one oligonucleotide of an initiating 
polynucleotide is adsorbed or otherwise affixed to a 
selected matrix or surface. This fixed, single-stranded 
5 nucleic acid is then subjected to hybridization with the 
complementary oligonucleotides under desired conditions. 
The selected conditions will also depend on the 
particular circumstances based on the particular criteria 
required (depending, for example, on the G+C content, 
10 type of target nucleic acid, source of nucleic acid, size 
of hybridization probe, etc.)- Following washing of the 
hybridized surface to remove non-specif ically bound 
oligonucleotides, the hybridization can be detected, or 
even quantified, by means of the label. 



15 In one embodiment, the method of the invention 

Hi further includes identifying a second polynucleotide 

sequence present in the target polynucleotide which is 
O contiguous with the initiating polynucleotide and 

r^i includes at least one plus strand oligonucleotide 

Til 20 annealed to at least one minus strand oligonucleotide 

resulting in a partially double-stranded polynucleotide 
comprised of a 5' overhang, a 3' overhang, or a 5' 
overhang and a 3' overhang, where at least one overhang 
of the second polynucleotide is complementary to at least 
25 one overhang of the initiating polynucleotide. In this 
embodiment, the invention further provides a third 
polynucleotide present in the target polynucleotide which 
is contiguous with the initiating sequence and provides a 
5' overhang, a 3' overhang, or a 5' overhang and a 3 r 
30 overhang, where at least one overhang of the third 

polynucleotide is complementary to at least one overhang 
of the initiating polynucleotide which is not 
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complementary to an overhang of the second 
polynucleotide. Subsequent polynucleotides are added at 
alternating ends so as to extend the initiating 
polynucleotide in an alternating bi-directional manner. 



|1| 



5 The method further provides contacting the 

initiating polynucleotide with the second polynucleotide 
and the third polynucleotide under conditions and for 
such time suitable for annealing, the contacting 
resulting in a contiguous double-stranded polynucleotide, 

10 resulting in the extension of the initiating 

polynucleotide. The annealed polynucleotides are 
optionally contacted with a ligase under conditions 
suitable for ligation. The method discussed above is 
optionally repeated to sequentially add double-stranded 

15 polynucleotides to the extended initiating polynucleotide 
through repeated cycles of annealing and ligation. 

As described herein, in the methods of the 
invention the intitiating polynucleotide can be extended 
by uni-directional or by bi-directional extension as well 

20 as by mixed uni-directional and bi-directional extension . 
As described above, in an alternating bi-directional 
extension a next most terminal oligonucleotide or 
polynucleotide is added that has least one overhang 
complementary to at least one overhang of the extended 

25 initiating polynucleotide which is not complementary to 
an overhang of the oligonucleotide or polynucleotide that 
was added immediately before, thus resulting in an 
alternating bi-directional pattern of addition of 
subsequent next most terminal oligonucleotides. In other 

30 embodiments, a next most terminal oligonucleotide can be 
any next most terminal polynucleotide regardless of 
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whether its addition to the extending double-stranded 
polynucleotide will result in bi-directional or uni- 
directional extension. In such embodiments, the next 
most terminal oligonucleotide or polynucleotide has an 
5 overhang that can be complementary to any overhang of the 
extending double-stranded polynucleotide . 



nil 



According to the methods of the invention a 
polynucleotide can be assembled randomly or can encompass 
target polynucleotide%sequence, which can be designed de 
10 novo or derived from a predetermined model polynucleotide 
sequence. A target polynucleotide can be of any desired 
^ size or complexity and can include anything from 

rl polyncleotides encoding entire genomes or pathways to 

^ partial sequences, segments or fragments of an 

15 oligonucleotide or polynucleotide. A model 

polynucleotide sequence includes any nucleic acid 
sequence that is predetermined before assembly and, for 
example, can encode a model polypeptide sequence. A 
model polypeptide sequence can provide a basis for 
20 designing a modified polynucleotide such that a target 
polynucleotide incorporating the desired modification is 
synthesized . 

The present invention provides also provides 
methods that can be used to synthesize, de novo, 
25 polynucleotides that encode sets of genes, either 
naturally occurring genes expressed from natural or 
artificial promoter constructs or artificial genes 
derived from synthetic DNA sequences, which encode 
elements of biological systems that perform a specified 
30 function or attribution of an artificial organism as well 
as entire genomes. In producing such systems and 
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genomes, the present invention provides the synthesis of 
a replication-competent, double-stranded polynucleotide, 
wherein the polynucleotide has an origin of replication, 
a first coding region and a first regulatory region 
5 directing the expression of the first coding region. 



As used herein, the term "replication- 
competent", refers to a polynucleotide that is capable of 
directing its own replication. A replication-competent 
polynucleotide encompasses a regulatory region that has 
10 the cis-acting signals and regulatory elements required 
to direct expression of the coding region. The 
^ replication-competent polynucleotide obviates the need 

f.g for recombinant methods such as cloning of a synthesized 

l^j coding region into a vector such as a plasmid or a virus 

r|| 15 in order to confer replication- competence . 

U 

=s A polynucleotide sequence defining a gene, 

' genome, set of genes or protein sequence can be designed 

Q in a computer-assisted manner (discussed below) and used 

f|| 

1**1 to generate a set of parsed oligonucleotides covering the 

20 plus ( + ) and minus (-) strand of the sequence. As used 
herein, a "parsed" means a target polynucleotide sequence 
has been delineated in a computer-assisted manner such 
that a series of contiguous oligonucleotide sequences are 
identified. The oligonucleotide sequences are 
25 individually synthesized and used in a method of the 
invention to generate a target polynucleotide. The 
length of an oligonucleotide is quite variable. 
Preferably, oligonucleotides used in the methods of the 
invention are between about 15 and 100 bases and more 
30 preferably between about 20 and 50 bases. Specific 

lengths include, but are not limited to 15, 16, 17, 18, 




44 



19, 


20, 


21, 


22, 


23, 


24, 


25, 


26, 


27, 


28, 


29, 


30, 


31, 


32, 


33, 


34, 


35, 


36, 


37, 


38, 


39, 


40, 


41, 


42, 


43, 


44, 


45, 


46, 


47, 


48, 


49, 


50, 


51, 


52, 


53, 


54, 


55, 


56, 


57, 


58, 


59, 


60, 


61, 


62, 


63, 


64. 


65, 


66, 


67, 


68, 


69, 


70, 


71, 


72, 


73, 


74, 


75, 


76, 


77, 


78, 


79, 


80, 


81, 


82, 


83, 


84, 


85, 


86, 


87, 


88, 


89, 


90, 


91, 


92, 


93, 


94, 


95, 


96, 


97, 


98, 


99 


and 


100 


bases 



Depending on the size, the overlap between the 
oligonucleotides having partial complementarity can be 
designed to be between 5 and 75 bases per oligonucleotide 
10 pair. 

The oligonucleotides preferably are treated 
with polynucleotide kinase, for example, T4 
polynucleotide kinase. The kinasing can be performed 
prior to, or after, mixing of the oligonucleotides set or 

15 after, but before annealing. After annealing, the 
oligonucleotides are treated with an enzyme having a 
ligating function. For example, a DNA ligase typically 
will be employed for this function. However, 
topoisomerase, which does not require 5 1 phosphorylation, 

20 is rapid and operates at room temperature, and can be 
used instead of ligase. For example, 50 base pair 
oligonucleotides overlapping by 25 bases can be 
synthesized by an oligonucleotide array synthesizer 
(OAS) . A 5' (+) strand set of oligonucleotides is 

25 synthesized in one 96-well plate and the second 3' or (-) 
strand set is synthesized in a second 96-well microtiter 
plate. Synthesis can be carried out using 
phosphoramidite chemistry modified to miniaturize the 
reaction size and generate small reaction volumes and 

30 yields in the range of 2 to 5 nmole. Synthesis is done 
on controlled pore glass beads (CPGs), then the completed 
oligonucleotides are deblocked, deprotected and removed 
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from the beads. The oligonucleotides are lyophilized, 
re-suspended in water and 5' phosphorylated using 
polynucleotide kinase and ATP to enable ligation. 



The set of arrayed oligonucleotide sequences in 
5 the plate can be assembled using a mixed pooling 

strategy. For example, systematic pooling of component 
oligonucleotides can be performed using a modified 
Beckman Biomek automated pipetting robot, or another 
automated lab workstation. The fragments can be combined 
10 with buffer and enzyme (Taq I DNA ligase or Egea 

Assemblase™, for example) . Pooling can be performed in 
microwell plates. After each step of pooling, the 
temperature is ramped to enable annealing and ligation, 
then additional pooling carried out. 

15 In the assembly methods of the invention, slow 

annealing by generally no more than 1 . 5°C per minute to 
37°C or below can performed to maximize the efficiency of 
hybridization. Slow annealing can be accomplished by a 
variety of methods, for example, with a programmable 

20 thermocycler . The cooling rate can be linear or non- 
linear and can be, for example, 0.1°C, 0.2°C, 0.3°C, 0.4°C, 
0.5°C, 0.6°C, 0.7°C, 0.8°C, 0.9°C, 1.0°C, 1.1°C, 1.2°C, 
1.3°C, 1.4°C, 1.5°C, 1.6°C, 1.7°C, 1.8°C, 1.9°C, or 2.0°C. 
The cooling rate can be adjusted up or down to maximize 

25 efficiency and accuracy. 



Target polynucleotide assembly involves forming 
a set of intermediates. A set of intermediates can 
include a plus strand oligonucleotide annealed to a minus 
strand oligonucleotide, as described above. The annealed 
30 intermediate can be formed by providing a single plus 
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strand oligonucleotide annealed to a single minus strand 
oligonucleotide . 

Alternatively, two or more oligonucleotides can 
comprise the plus strand or the minus strand. For 
5 example, in order to construct a polynucleotide (e.g., an 
initiating polynucleotide) which can be used to assemble 
a target polynucleotide of the invention, three or more 
oligonucleotides can be annealed. Thus, a first plus 
strand oligonucleotide, a second plus strand 
10 oligonucleotide contiguous with the first plus strand 
oligonucleotide, and a minus strand oligonucleotide 
O having a first contiguous sequence which is at least 

:7| partially complementary to the first plus strand 

^ oligonucleotide and second contiguous sequence which is 

"U 

1| 15 at least partially complementary to the second plus 

strand oligonucleotide can be annealed to form a 

sssfij 

partially double-stranded polynucleotide. The 
polynucleotide can include a 5' overhang, a 3' overhang, 
or a 5' overhang and a 3' overhang. The first plus 
20 strand oligonucleotide and second plus strand 

oligonucleotide are contiguous sequences such that they 
are ligatable. The minus strand oligonucleotide is 
partially complementary to both plus strand 
oligonucleotides and acts as a "bridge" or "stabilizer" 
25 sequence by annealing to both oligonucleotides. 

Subsequent polynucleotides comprised of more than two 
oligonucleotides annealed as previously described, can be 
used to assemble a target polynucleotide in a manner 
resulting in a contiguous double-stranded polynucleotide. 



r-n 



30 An example of using two or more plus strand 

oligonucleotides to assemble a polynucleotide is shown in 
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Figure 3. A triplex of three oligonucleotides of about 
50 bp each, which overlap by about 25 bp form a "nicked'' 
intermediate. Two of these oligonucleotides provide a 
ligation substrate joined by ligase and the third 
5 oligonucleotide is a stabilizer that brings together two 
specific sequences by annealing resulting in the 
formation of a part of the final polynucleotide 
construct. This intermediate provides a substrate for 
DNA ligase which, through its nick sealing activity, 
10 joins the two 50-base pair oligonucleotides into a single 
100 base single-stranded polynucleotide. 

□ Following initial pooling and formation of 
:f\ annealed products, the products are assembled into 

'bit;? 

increasingly larger polynucleotides. For example, 
ill 15 following triplex formation of oligonucleotides, sets of 

! 41 triplexes are systematically joined, ligated, and 

; M& 

:» assembled. Each step can be mediated by robotic pooling, 

ligation and thermal cycling to achieve annealing and 

□ denaturation . The final step joins assembled pieces into 
20 a complete sequence representing all of the fragments in 

the array. Since the efficiency of yield at each step is 
less than 100%, the mass amount of completed product in 
the final mixture can be very small. Optionally, 
additional specific oligonucleotide primers, usually 15 
25 to 20 bases and complementary to the extreme ends of the 
assembly, can be annealed and PCR amplification carried 
out, thereby amplifying and purifying the final full- 
length product . 

The methods of the invention provide several 
30 improvements over existing polynucleotide synthesis 
technology. For example, synthesis can utilize 
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microdispensing piezioelectric or microsolenoid 
nanodispensors allowing very fast synthesis, much smaller 
reaction volumes and higher density plates as synthesis 
vessels. The instrument will use up to 1536 well plates 
5 giving a very high capacity. Additionally, controlled 
pooling can be performed by a microfluidic manifold that 
will move individual oligonucleotides though 
microchannels and mix/ligate in a controlled way. This 
will obviate the need for robotic pipetting and increases 
10 speed and efficiency. Thus, an apparatus that 

accomplishes a method of the invention will have a 
greater capability for simultaneous reactions giving an 
□ overall larger capacity for gene length. 

jjUj Once target polynucleotides have been 

ffj 15 synthesized using a method of the present invention, it 

;W : can be necessary to screen the seguences for analysis of 

function. Specifically contemplated by the present 
inventor are chip-based DNA technologies. Briefly, these 
Q techniques involve quantitative methods for analyzing 

; I( !f 20 large numbers of genes rapidly and accurately. By 

'CI 

tagging genes with oligonucleotides or using fixed probe 
arrays, one can employ chip technology to segregate 
target molecules as high-density arrays and screen these 
molecules on the basis of hybridization. 



25 The use of combinatorial synthesis and high 

throughput screening assays are well known to those of 
skill in the art. For example, U.S Patent Number 
5,807,754; 5,807,683; 5,804,563; 5,789,162; 5,783,384; 
5,770,358; 5,759,779; 5, 747, 334 ; 5, 686, 242; 5,198,346; 

30 5,738,996; 5,733, 743; 5,714,320; and 5,663,046 (each 
specifically incorporated herein by reference) describe 
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screening systems useful for determining the activity of 
a target polypeptide . These patents teach various 
aspects of the methods and compositions involved in the 
assembly and activity analyses of high-density arrays of 
5 different polysubunits (polynucleotides or polypeptides) . 
As such it is contemplated that the methods and 
compositions described in the patents listed above can be 
useful in assaying the activity profiles of the target 
polypeptides of the present invention. 



10 In another embodiment, the invention provides a 

method of synthesizing a target polynucleotide by 
CI providing a target polynucleotide sequence and 

^ identifying at least one initiating polynucleotide 

fll sequence present in the target polynucleotide sequence 

f|l 15 that includes at least one plus strand oligonucleotide 

HI annealed to at least one minus strand oligonucleotide 

?E resulting in a double-stranded polynucleotide. The 

^ initiating polynucleotide is contacted under conditions 

p suitable for primer annealing with a first 

yj? 20 oligonucleotide having partial complementarity to the 3' 

portion of the plus strand of the initiating 
polynucleotide, and a second oligonucleotide having 
partial complementarity to the 3'. portion of the minus 
strand of the initiating polynucleotide* Primer 
25 extension subsequently performed using polynucleotide 

synthesis from the 3'-hydroxyl of: 1) the plus strand of 
the initiating polynucleotide; 2) the annealed first 
oligonucleotide; 3) the minus strand of the initiating 
polynucleotide; and 4) the annealed second 
30 oligonucleotide. The synthesis results in the initiating 
sequence being extended bi-directionally thereby forming 
a nascent extended initiating polynucleotide. The 
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extended initiating sequence can be further extended by 
repeated cycles of annealing and primer extension. 



As previously noted, oligonucleotides can be 
used as building blocks to assemble polynucleotides 
5 through annealing and ligation reactions. Alternatively, 
oligonucleotides can be used as primers to manufacture 
polynucleotides through annealing and primer extension 
reactions. The term "primer" is used herein to refer to 
a binding element which comprises an oligonucleotide, 
10 whether occurring naturally as in a purified restriction 
digest or produced synthetically, which is capable of 
O acting as a point of initiation of synthesis when placed 

under conditions in which synthesis of a primer extension 

III product which is complementary to a nucleic acid strand 

rf.1 

ml 15 is induced, i.e., in the presence of appropriate 

*"U nucleotides and an agent for polymerization such as a DNA 

1, polymerase in an appropriate buffer ("buffer" includes 

l *f pH, ionic strength, cofactors, etc.) and at a suitable 

CO. 

^| temperature. 

f|j 

.CJ : 20 The primer is preferably single stranded for 

maximum efficiency in amplification, but can 
alternatively be double stranded. If double stranded, 
the primer is first treated to separate its strands 
before being used to prepare extension products. 

25 Preferably, the primer is an oligodeoxyribonucleotide . 
The primer must be sufficiently long to prime the 
synthesis of extension products in the presence of the 
agent for polymerization. The exact lengths of the 
primers will depend on many factors, including 

30 temperature and source of primer and use of the method. 
Primers having only short sequences capable of 
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hybridization to the target nucleotide sequence generally 
require lower temperatures to form sufficiently stable 
hybrid complexes with the template. 



The primers herein are selected to be 
5 "substantially" complementary to the different strands of 
each specific sequence to be amplified. This means that 
the primers must be sufficiently complementary to 
hybridize with their respective strands. Therefore, the 
primer sequence need not reflect the exact sequence of 
10 the template. Commonly, however, the primers have exact 
complementarity except with respect to analyses effected 
according to the method described in Nucleic Acids 
Research 17 (7) 2503-2516 (1989) or a corresponding 

U- method employing linear amplification or an amplification 

II 

|| 15 technique other than the polymerase chain reaction. 

A 

The agent for primer extension of an 
*! oligonucleotide can be any compound or system that will 

function to accomplish the synthesis of primer extension 
products, including enzymes. Suitable enzymes for this 
20 purpose include, for example, E. coli DNA Polymerase I, 
Klenow fragment of E. coli DNA polymerase I, T4 DNA 
polymerase, other available DNA polymerases, reverse 
transcriptase, and other enzymes, including thermostable 
enzymes. The term "thermostable enzyme" as used herein 
25 refers to any enzyme that is stable to heat and is heat 
resistant and catalyses (facilitates) combination of the 
nucleotides in the proper manner to form the primer 
extension products which are complementary to each 
nucleic acid strand. Generally, the synthesis will be 
30 initiated at the 3 1 end of each primer and will proceed 
in the 5 1 direction along the template strand, until 



:1J 
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synthesis terminates. A preferred thermostable enzyme 
that can be employed in the process of the present 
invention is that which can be extracted and purified 
from Thermus aquaticus . Such an enzyme has a molecular 
5 weight of about 86,000- 90,000 daltons. Thermus 

aquaticus strain YTl is available without restriction 
from the American Type Culture Collection, 12301 Parklawn 
Drive, Rockville, Md., U.S.A. as ATCC 25,104. 

Processes for amplifying a desired target 

10 polynucleotide are known and have been described in the 
literature, K. Kleppe et al in J. Mol . Biol., (1971), 
56, 341-361 disclose a method for the amplification of a 
desired DNA sequence. The method involves denaturation 
of a DNA duplex to form single strands. The denaturation 

15 step is carried out in the presence of a sufficiently 
large excess of two nucleic acid primers that hybridize 
to regions adjacent to the desired DNA sequence. Upon 
cooling two structures are obtained each containing the 
full length of the template strand appropriately 

20 complexed with primer. DNA polymerase and a sufficient 

amount of each required nucleoside triphosphate are added 
whereby two molecules of the original duplex are 
obtained. The above cycle of denaturation, primer 
addition and extension are repeated until the appropriate 

25 number of copies of the desired target polynucleotide is 
obtained. 

The present invention further provides a method 
for the expression and isolation of a target polypeptide 
encoded by a target polynucleotide. The method includes 
30 incorporating a target polynucleotide synthesized by a 
method of the invention into an expression vector; 
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introducing the expression vector of into a suitable host 
cell; culturing the host cell under conditions and for 
such time as to promote the expression of the target 
polypeptide encoded by the target polynucleotide; and 
5 isolating the target polypeptide. 

The invention can be used to modify certain 
functional, structural, or phylogenic features of a model 
polynucleotide encoding a model polypeptide resulting in 
an altered target polypeptide. An input or model 
10 polynucleotide sequence encoding a model polypeptide can 
be electronically manipulated to determine a potential 
; 5 for an effect of an amino acid change (or variance) at a 

: FV 

: ti p particular site or multiple sites in the model 

Sf- l 5 

polypeptide. Once identified, a novel target 

i y 

f|| 15 polynucleotide sequence is assembled by a method of the 

V V 

:^ invention such that the target polynucleotide encodes a 

:s target polypeptide possessing a characteristic different 

from that of the model polypeptide. 

'X The methods of the invention can rely on the 

h& 20 use of public sequence and structure databases. These 

databases become more robust as more and more sequences - 
and structures are added. Information regarding the 
amino acid sequence of a target polypeptide and the 
tertiary structure of the polypeptide can be used to 
25 synthesize oligonucleotides that can be assembled into a 
target polynucleotide encoding a target polypeptide. A 
model polypeptide should have sufficient structural 
information to analyze the amino acids involved in the 
function of the polypeptide. The structural information 
30 can be derived from x-ray crystallography, NMR, or some 
other technique for determining the structure of a 
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protein at the amino acid or atomic level. Once 
selected, the sequence and structural information 
obtained from the model polypeptide can be used to 
generate a plurality of polynucleotides encoding a 
5 plurality of variant amino acid sequences that comprise a 
target polypeptide. Thus, a model polypeptide can be 
selected based on overall sequence similarity to the 
target protein or based on the presence of a portion 
having sequence similarity to a portion of the target 
10 polypeptide. 

A "polypeptide", as used herein, is a polymer 
Q in which the monomers are alpha amino acids and are 

; Jl joined together through amide bonds. Amino acids can be 

\}l the L-optical isomer or the D-optical isomer. 

PI] 15 Polypeptides are two or more amino acid monomers long and 

\ u are often more than 2 0 amino acid monomers long. 

•5 Standard abbreviations for amino acids are used (e.g., P 

for proline). These abbreviations are included in 
Q Stryer, Biochemistry, Third Ed., 1988, which is 

; ¥ JJ' 20 incorporated herein by reference for all purposes. With 

respect to polypeptides, "isolated" refers to a 
polypeptide that constitutes the major component in a 
mixture of components, e.g., 50% or more, 60% or more, 
70% or more, 80% or more, 90% or more, or 95% or more by 
25 weight. Isolated polypeptides typically are obtained by 
purification from an organism in which the polypeptide 
has been produced, although chemical synthesis is also 
possible. Method of polypeptide purification includes, 
for example, chromatography or immunoaf f inity techniques. 
30 Polypeptides of the invention can be detected by sodium 
dodecyl sulphate ( SDS ) -polyacrylamide gel electrophoresis 
followed by Coomassie Blue-staining or Western blot 
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analysis using monoclonal or polyclonal antibodies that 
have binding affinity for the polypeptide to be detected* 



A "chimeric polypeptide,," as used herein, is a 
polypeptide containing portions of amino acid sequence 
5 derived from two or more different proteins, or two or 
more regions of the same protein that are not normally 
contiguous . 



I'll 



A "ligand", as used herein, is a molecule that 
is recognized by a receptor. Examples of ligands that 

10 can be investigated by this invention include, but are 
not restricted to, agonists and antagonists for cell 
membrane receptors, toxins and venoms, viral epitopes, 
hormones, opiates, steroids, peptides, enzyme substrates, 
cofactors, drugs, lectins, sugars, oligonucleotides, 

15 nucleic acids, oligosaccharides, and proteins. 

A "receptor", as used herein, is a molecule 
that has an affinity for a ligand. Receptors can be 
naturally-occurring or manmade molecules. They can be 
employed in their unaltered state or as aggregates with 

20 other species. Receptors can be attached, covalently or 
noncovalently, to a binding member, either directly or 
via a specific binding substance. Examples of receptors 
which can be employed by this invention include, but are 
not restricted to, antibodies, cell membrane receptors, 

25 monoclonal antibodies and antisera reactive with specific 
antigenic determinants, viruses, cells, drugs, 
polynucleotides, nucleic acids, peptides, cofactors, 
lectins, sugars, polysaccharides, cellular membranes, and 
organelles. A "ligand receptor pair" is formed when two 

30 molecules have combined through molecular recognition to 
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autoimmune diseases (e.g., by blocking the binding of the 
"self" antibodies) . 

d) Polynucleotides: Sequences of 
polynucleotides can be synthesized to establish DNA or 

5 RNA binding sequences that act as receptors for 
synthesized sequence . 

e) Catalytic Polypeptides: Polymers, preferably 
antibodies, which are capable of promoting a chemical 
reaction involving the conversion of one or more 

10 reactants to one or more products. Such polypeptides 
generally include a binding site specific for at least 
one reactant or reaction intermediate and an active 
functionality proximate to the binding site, which 
functionality is capable of chemically modifying the 
15 bound reactant. Catalytic polypeptides and others are 
described in, for example, PCT Publication No. WO 
90/05746, WO 90/05749, and WO 90/05785, which are 
SI incorporated herein by reference for all purposes. 



f) Hormone receptors: Identification of the 
20 ligands that bind with high affinity to a receptor such 
as the receptors for insulin and growth hormone is useful 
in the development of, for example, an oral replacement 
of the daily injections which diabetics must take to 
relieve the symptoms of diabetes or a replacement for 
25 growth hormone. Other examples of hormone receptors 
include the vasoconstrictive hormone receptors; 
determination of ligands for these receptors can lead to 
the development of drugs to control blood pressure. 
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form a complex. 

Specific examples of polypeptides which can 
synthesized by this invention include but are not 
restricted to: 

a) Microorganism receptors: Determination of 
ligands that bind to microorganism receptors such as 
specific transport proteins or enzymes essential to 
survival of microorganisms would be a useful tool for 
discovering new classes of antibiotics. Of particular 
value would be antibiotics against opportunistic fungi, 
protozoa, and bacteria resistant to antibiotics in 
current use . 

b) Enzymes: For instance, a receptor can 
comprise a binding site of an enzyme such as an enzyme 
responsible for cleaving a neurotransmitter; 
determination of ligands for this type of receptor to 
modulate the action of an enzyme that cleaves a 
neurotransmitter is useful in developing drugs that can 
be used in the treatment of disorders of 
neurotransmission . 

c) Antibodies: For instance, the invention can 
be useful in investigating a receptor that comprises a 
ligand-binding site on an antibody molecule which 
combines with an epitope of an antigen of interest; 
determining a sequence that mimics an antigenic epitope 
can lead to the development of vaccines in which the 
immunogen is based on one or more of such sequences or 
lead to the development of related diagnostic agents or 
compounds useful in therapeutic treatments such as for 
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g) Opiate receptors: Determination of ligands 
which bind to the opiate receptors in the brain is useful 
in the development of less-addictive replacements for 
morphine and related drugs. 

5 In the context of a polypeptide, the term 

"structure" refers to the three dimensional arrangement 
of atoms in the protein. "Function" refers to any 
measurable property of a protein. Examples of protein 
function include, but are not limited to, catalysis, 
10 binding to other proteins, binding to non-protein 

molecules (e.g., drugs), and isomerization between two or 
more structural forms. "Biologically relevant protein" 
refers to any protein playing a role in the life of an 
organism. 

15 To identify significant structural motifs, the 

sequence of the model polypeptide is examined for matches 
to the entries in one or more databases of recognized 
domains, e.g., the PROSITE database domains (Bairoch, 
Nucl. Acids. Res. 24:217, 1997) or the pfam HMM database 

20 (Bateman et al., (2000) Nucl. Acids. Res. 28:263). The 
PROSITE database is a compilation of two types of 
sequence signatures-profiles, typically representing 
whole protein domains , and patterns typically 
representing just the most highly conserved functional or 

25 structural aspects of protein domains. 

The methods of the invention can be used to 
generate polypeptides containing polymorphisms that have 
an effect on a catalytic activity of a target polypeptide 
or a non-catalytic activity of the target polypeptide 
30 (e.g., structure, stability, binding to a second protein 
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or polypeptide chain, binding to a nucleic acid molecule, 
binding to a small molecule, and binding to a 
macromolecule that is neither a protein nor a nucleic 
acid) . For example, the invention provides a means for 
5 assembling any polynucleotide sequence encoding a target 
polypeptide such that the encoded polypeptide can be 
expressed and screened for a particular activity. By 
altering particular amino acids at specific points in the 
target polypeptide, the operating temperature, operating 

10 pH, or any other characteristic of a polypeptide can be 
manipulated resulting in a polypeptide with a unique 
activity. Thus, the methods of the invention can be used 
to identify amino acid substitutions that can be made to 
engineer the structure or function of a polypeptide of 

15 interest (e.g., to increase or decrease a selected 
activity or to add or remove a selective activity) . 

In addition, the methods of the invention can 
be used in the identification and analysis of candidate 
polymorphisms for polymorphism-specific targeting by 
20 pharmaceutical or diagnostic agents, for the 

identification and analysis of candidate polymorphisms 
for pharmacogenomic applications, and for experimental 
biochemical and structural analysis of pharmaceutical 
targets that exhibit amino acid polymorphism. 

25 A library of target polynucleotides encoding a 

plurality of target polypeptides can be prepared by the 
present invention. Host cells are transformed by 
artificial introduction of the vectors containing the 
target polynucleotide by inoculation under conditions 

30 conducive for such transformation. The resultant 

libraries of transformed clones are then screened for 
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clones which display activity for the polypeptide of 
interest in a phenotypic assay for activity. 

A target polynucleotide of the invention can be 
incorporated (i.e., cloned) into an appropriate vector. 
5 For purposes of expression, the target sequences encoding 
a target polypeptide of the invention can be inserted 
into a recombinant expression vector. The term 
"recombinant expression vector" refers to a plasmid, 
virus, or other vehicle known in the art that has been 

10 manipulated by insertion or incorporation of the 

polynucleotide sequence encoding a target polypeptide of 
the invention. The expression vector typically contains 
an origin of replication, a promoter, as well as specific 
genes that allow phenotypic selection of the transformed 

15 cells. Vectors suitable for use in the present invention 
include, but are not limited to, the T7-based expression 
vector for expression in bacteria (Rosenberg et al., 
Gene, 56:125, 1987), the pMSXND expression vector for 
expression in mammalian cells (Lee and Nathans, J. Biol. 

20 Chem., 263:3521, 1988), baculovirus-derived vectors for 
expression in insect cells, cauliflower mosaic virus, 
CaMV, tobacco mosaic virus, TMV. 

Depending on the vector utilized, any of a 
number of suitable transcription and translation 
25 elements, including constitutive and inducible promoters, 
transcription enhancer elements, transcription 
terminators, etc. can be used in the expression vector 
(see, e.g., Bitter et al., Methods in Enzymology, 
153:516-544, 1987). These elements are well known to one 
30 of skill in the art. 
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The term "operably linked" or "operably 
associated" refers to functional linkage between the 
regulatory sequence and the polynucleotide sequence 
regulated by the regulatory sequence. The operably 
5 linked regulatory sequence controls the expression of the 
product expressed by the polynucleotide sequence. 
Alternatively, the functional linkage also includes an 
enhancer element . 



"Promoter" means a nucleic acid regulatory 
10 sequence sufficient to direct transcription. Also 

included in the invention are those promoter elements 
that are sufficient to render promoter-dependent 
polynucleotide sequence expression controllable for cell- 
type specific, tissue specific, or inducible by external 
15 signals or agents; such elements can be located in the 5 1 
or 3 ? regions of the native gene, or in the introns . 

"Gene expression" or "polynucleotide sequence 
expression" means the process by which a nucleotide 
sequence undergoes successful transcription and 
20 translation such that detectable levels of the delivered 
nucleotide sequence are expressed in an amount and over a 
time period so that a functional biological effect is 
achieved. 



In yeast, a number of vectors containing 
25 constitutive or inducible promoters can be used. 

(Current Protocols in Molecular Biology, Vol. 2, Ed. 
Ausubel et al., Greene Publish. Assoc. & Wiley 
Interscience, Ch. 13, 1988; Grant et al., "Expression and 
Secretion Vectors for Yeast," in Methods in Enzymology, 
30 Eds. Wu & Grossman, Acad. Press, N.Y., Vol. 153, pp.516- 
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544, 1987; Glover, DNA Cloning, Vol. II, IRL Press, 
Wash., D.C., Ch. 3, 1986; "Bitter, Heterologous Gene 
Expression in Yeast," Methods in Enzymology, Eds. Berger 
& Kimmel, Acad. Press, N.Y. , Vol. 152, pp. 673-684, 1987; 
5 and The Molecular Biology of the Yeast Saccharomyces , 

Eds. Strathern et al., Cold Spring Harbor Press, Vols. I 
and II, 1982) . A constitutive yeast promoter, such as 
ADH or LEU2, or an inducible promoter, such as GAL, can 
be used ("Cloning in Yeast," Ch. 3, R. Rothstein In: DNA 
10 Cloning Vol.11, A Practical Approach, Ed. DM Glover, IRL 
Press, Wash., D.C., 1986). Alternatively, vectors can be 
used which promote integration of foreign DNA sequences 
:?! !f into the yeast chromosome. 

ij; In certain embodiments, it can be desirable to 

Til 15 include specialized regions known as telomeres at the end 

111 

of a target polynucleotide sequence. Telomeres are 

* repeated sequences found at chromosome ends and it has 

|?l long been known that chromosomes with truncated ends are 

C5 unstable, tend to fuse with other chromosomes and are 

20 otherwise lost during cell division. 

Some data suggest that telomeres interact with 
the nucleoprotein complex and the nuclear matrix. One 
putative role for telomeres includes stabilizing 
chromosomes and shielding the ends from degradative 
2 5 enzyme . 
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Another possible role for telomeres is in 
replication. According to present doctrine, replication 
of DNA requires starts from short RNA primers annealed to 
the T-end of the template. The result of this mechanism 
5 is an "end replication problem" in which the region 

corresponding to the RNA primer is not replicated. Over 
many cell divisions, this will result in the progressive 
truncation of the chromosome. It is thought that 
telomeres can provide a buffer against this effect, at 
10 least until they are themselves eliminated by this 

effect. A further structure that can be included in 
target polynucleotide is a centromere. 

In certain embodiments of the invention, the 
delivery of a nucleic acid in a cell can be identified in 
15 vitro or in vivo by including a marker in the expression 
construct. The marker would result in an identifiable 
change to the transfected cell permitting easy 
identification of expression. 

An expression vector of the invention can be 
20 used to transform a target cell. By "transformation" is 
meant a genetic change induced in a cell following 
incorporation of new DNA (i.e., DNA exogenous to the 
cell) . Where the cell is a mammalian cell, the genetic 
change is generally achieved by introduction of the DNA 
25 into the genome of the cell. By "transformed cell" is 
meant a cell into which (or into an ancestor of which) 
has been introduced, by means of recombinant DNA 
techniques. Transformation of a host cell with 
recombinant DNA can be carried out by conventional 
30 techniques as are well known to those skilled in the art. 
Where the host is prokaryotic, such as E. coli, competent 
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cells that are capable of DNA uptake can be prepared from 
cells harvested after exponential growth phase and 
subsequently treated by the CaCl 2 method by procedures 
well known in the art. Alternatively, MgCl 2 or RbCl can 
5 be used. Transformation can also be performed after 
forming a protoplast of the host cell or by 
elect ropor at ion . 



A target polypeptide of the invention can be 
produced in prokaryotes by expression of nucleic acid 

10 encoding the polypeptide. These include, but are not 

limited to, microorganisms, such as bacteria transformed 
with recombinant bacteriophage DNA, plasmid DNA, or 
cosmid DNA expression vectors encoding a polypeptide of 
the invention. The constructs can be expressed in E. 

15 coli in large scale for in vitro assays. Purification 
from bacteria is simplified when the sequences include 
tags for one-step purification by nickel-chelate 
chromatography. The construct can also contain a tag to 
simplify isolation of the polypeptide. For example, a 

20 polyhistidine tag of, e.g., six histidine residues, can 
be incorporated at the amino terminal end, or carboxy 
terminal end, of the protein. The polyhistidine tag 
allows convenient isolation of the protein in a single 
step by nickel-chelate chromatography. The target 

25 polypeptide of the invention can also be engineered to 
contain a cleavage site to aid in protein recovery. 
Alternatively, the polypeptides of the invention can be 
expressed directly in a desired host cell for assays in 
situ. 
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When the host is a eukaryote, such methods of 
transfection of DNA as calcium phosphate co-precipitates, 
conventional mechanical procedures, such as 
microinjection, electroporation or biollistic techniques, 
5 insertion of a plasmid encased in liposomes, or virus 
vectors can be used. Eukaryotic cells can also be 
cotransf ected with DNA sequences encoding a polypeptide 
of the invention, and a second foreign DNA molecule 
encoding a selectable phenotype, such as the herpes 

10 simplex thymidine kinase gene. Another method is to use 
a eukaryotic viral vector, such as simian virus 40 (SV40) 
or bovine papilloma virus, to transiently infect or 
transform eukaryotic cells and express the protein. 
(Eukaryotic Viral Vectors, Cold Spring Harbor Laboratory, 

15 Gluzman ed., 1982). Preferably, a eukaryotic host is 
utilized as the host cell, as described herein. 
Eukaryotic systems, and preferably mammalian expression 
systems, allow for proper post-translational 
modifications of expressed mammalian proteins to occur. 

20 Eukaryotic cells that possess the cellular machinery for 
proper processing of the primary transcript, 
glycosylation, phosphorylation, and advantageously 
secretion of the gene product should be used as host 
cells for the expression of the polypeptide of the 

25 invention. Such host cell lines can include, but are not 
limited to, CHO, VERO, BHK, HeLa, COS, MDCK, Jurkat, HEK- 
293, and WI38. 

For long-term, high-yield production of 
recombinant proteins, stable expression is preferred. 
30 Rather than using expression vectors that contain viral 
origins of replication, host cells can be transformed 
with the cDNA encoding a target polypeptide of the 
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invention controlled by appropriate expression control 
elements (e.g., promoter, enhancer, sequences, 
transcription terminators, polyadenylation sites, etc.) , 
and a selectable marker. The selectable marker in the 
5 recombinant plasmid confers resistance to the selection 
and allows cells to stably integrate the plasmid into 
their chromosomes and grow to form foci that, in turn, 
can be cloned and expanded into cell lines. For example, 
following the introduction of foreign DNA, engineered 

10 cells can be allowed to grow for 1-2 days in an enriched 
media, and then are switched to a selective media. A 
number of selection systems can be used, including, but 
not limited to, the herpes simplex virus thymidine kinase 
(Wigler et al., Cell, 11:223, 1977), hypoxanthine-guanine 

15 phosphoribosyltransf erase (Szybalska & Szybalski, Proc. 
Natl. Acad. Sci. USA, 48:2026, 1962), and adenine 
phosphoribosyltransf erase (Lowy et al., Cell, 22:817, 
1980) genes can be employed in tk-, hgprt- or aprt- 
cells, respectively. Also, antimetabolite resistance can 

20 be used as the basis of selection for dhfr, which confers 
resistance to methotrexate (Wigler et al., Proc. Natl. 
Acad. Sci. USA, 77:3567, 1980; O'Hare et al., Proc. Natl. 
Acad. Sci. USA, 8:1527, 1981); gpt, which confers 
resistance to mycophenolic acid (Mulligan & Berg, Proc. 

25 Natl. Acad. Sci. USA, 78:2072, 1981; neo, which confers 
resistance to the aminoglycoside G-418 (Colberre-Garapin 
et al., J. Mol. Biol., 150:1, 1981); and hygro, which 
confers resistance to hygromycin genes (Santerre et al., 
Gene, 30:147, 1984). Recently, additional selectable 

30 genes have been described, namely trpB, which allows 
cells to utilize indole in place of tryptophan; hisD, 
which allows cells to utilize histinol in place of 
histidine (Hartman & Mulligan, Proc. Natl. Acad. Sci. 
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USA, 85:8047, 1988); and ODC (ornithine decarboxylase), 
which confers resistance to the ornithine decarboxylase 
inhibitor, 2- (dif luoromethyl ) -DL-ornithine, DFMO 
(McConlogue L. , In: Current Communications in Molecular 
5 Biology, Cold Spring Harbor Laboratory, ed., 1987). 

Techniques for the isolation and purification 
of either microbially or eukaryotically expressed 
polypeptides of the invention can be by any conventional 
means, such as, for example, preparative chromatographic 
10 separations and immunological separations, such as those 
involving the use of monoclonal or polyclonal antibodies 
or antigen. 

A target polynucleotide, or expression 
construct containing a target polynucleotide, can be 

15 entrapped in a liposome. Liposomes are vesicular 
structures characterized by a phospholipid bilayer 
membrane and an inner aqueous medium. Multilamellar 
liposomes have multiple lipid layers separated by aqueous 
medium and form spontaneously when phospholipids are 

20 suspended in an excess of aqueous solution. The lipid 
components undergo self -rearrangement before the 
formation of closed structures and entrap water and 
dissolved solutes between the lipid bilayers. The 
liposome can be complexed with a hernagglutinating virus 

25 (HVJ). This has been shown to facilitate fusion with the 
cell membrane and promote cell entry of liposome- 
encapsulated DNA. In other embodiments, the liposome can 
be complexed or employed in conjunction with nuclear non- 
histone chromosomal proteins ( HMG-1 ) . In yet further 

30 embodiments, the liposome can be complexed or employed in 
conjunction with both HVJ and HMG-1 . In that such 
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expression constructs have been successfully employed in 
transfer and expression of nucleic acid in vitro and in 
vivo, then they are applicable for the present invention. 
Where a bacterial promoter is employed in the DNA 
5 construct, it also will be desirable to include within 
the liposome an appropriate bacterial polymerase. 

The present invention describes methods for 
enabling the creation of a target polynucleotide based 
upon information only, i.e., without the requirement for 

10 existing genes, DNA molecules or genomes. Generally, 
using computer software, it is possible to construct a 
virtual polynucleotide in the computer. This 
polynucleotide consists of a string of DNA bases, G, A, T 
or C, comprising for example an entire artificial 

15 polynucleotide sequence in a linear string. Following 
construction of a sequence, computer software is then 
used to parse the target sequence breaking it down into a 
set of overlapping oligonucleotides of specified length. 
Optional steps in sequence assembly include identifying 

20 and eliminating sequences that may give rise to hairpins, 
repeats or other sequences that are undesirable. 
Theerefore, success in a large gene construction can be 
substantially improved by pre-screening sequences for 
difficult regions or areas. In short, an amino acid 

25 sequence is used to generate a synthetic gene sequence 
using E. coli class II codons . Prior ti sequence 
parsing, a number of subroutines are applied to the 
sequence to identify specific types of sequences 
arrangements that could cause early termination in oligo 

30 synthesis, difficult or low efficiency in ligation or 
synthesis, unusual or atypical secondary structures. 
Programs are used to analyze the sequence and identify: 
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Any region of over 25 base pairs with a GC 
content of over 70% 

Any 3 1 or 5 1 terminal sequences that would form 
a "hairpin" hybrid of over 7 base -pairs, allowing a loop 
5 of up to 4 base pairs 

Any sequence of 8 base pairs or more that has a 
perfect inverted repeat within a 50 bp interval such that 
an internal hairpin can be formed 
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Following identification, the sequence is 
10 manually adjusted as follows. Third bases of codons will 
be changed to remove hairpins or decrease the number of 
paring bases in the hairpin to less than five contiguous 
bases. 

Where possible, third base codons will be 
15 changed, leaving the amino acid sequence unchanged, in 

order to decrease the GC content of a region to less than 
65% over 20 bases. 

Where possible, third base changes will be 
made, keeping te amino acids sequence the same, in order 
20 to remove internal hybrids or decrease the number of 
matching bases to less than 7. 

The resulting synthetic DNA sequence will still 
encode the same protein but the codon usage will be 
adjusted to remove sequence structures that might cause 
25 errors in assembly, might lower assembly efficiency or 
otherwise cause problems in the technical procedure of 
gene synthesis and assembly. 
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Subsequent parsing of the target sequence 
results in a set of shorter DNA sequences that overlap to 
cover the entire length of the target polynucleotide in 
overlapping sets. 

5 Typically, a gene of 1000 bases pairs would be 

broken down into 20 100- mers where 10 of these comprise 
one strand and 10 of these comprise the other strand. 
They would be selected to overlap on each strand by 25 to 
50 base pairs . 

10 The degeneracy of the genetic code permits 

substantial freedom in the choice of codons for any 
particular amino acid sequence. Transgenic organisms 
such as plants frequently prefer particular codons that, 
though they encode the same protein, can differ from the 

15 codons in the organism from which the gene was derived. 
For example, U.S. Pat. No. 5,380,831 to Adang et al. 
describes the creation of insect resistant transgenic 
plants that express the Bacillus thuringiensis (Bt) toxin 
gene. The Bt crystal protein, an insect toxin, is 

20 encoded by a full-length gene that is poorly expressed in 
transgenic plants. In order to improve expression in 
plants, a synthetic gene encoding the protein containing 
codons preferred in plants was substituted for the 
natural sequence. The invention disclosed therein 

25 comprised a chemically synthesized gene encoding an 

insecticidal protein which is frequently equivalent to a 
native insecticidal protein of Bt. The synthetic gene was 
designed to be expressed in plants at a level higher than 
a native Bt gene. 
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In designing a target polynucleotide that 
encodes a particular polypeptide, the hydropathic index 
of amino acids can be considered. The importance of the 
hydropathic amino acid index in conferring interactive 
5 biologic function on a protein is generally understood in 
the art. Each amino acid has been assigned a hydropathic 
index on the basis of their hydrophobicity and charge 
characteristics, these are: Isoleucine (+4.5); valine 
(+4.2); leucine (+3.8); phenylalanine (+2.8); 
10 cysteine/cystine (+2.5); methionine (+1.9); alanine 

(+1.8); glycine (-0.4); threonine (47); serine (-0.8); 
tryptophan (-0.9); tyrosine (-1.3); proline (-1.6); 

3 histidine (-3.2); glutamate (-3.5); glutamine (- 3.5); 

■Ti aspartate (-3.5); asparagine (-3.5); lysine (-3.9); and 

■^f 15 arginine (45) . 

m s 

i y 

: y It is known in the art that certain amino acids 

^ can be substituted by other' amino acids having a similar 

^ hydropathic index or score and still result in a protein 

S3. with similar biological activity, i.e., still obtain a 

-If 

;^ 20 biological functionally equivalent protein. In making 

M* such changes, the substitution of amino acids whose 

hydropathic indices are within +2 is preferred, those 
which are within ±1 are particularly preferred, and 
those within ±0.5 are even more particularly preferred. 

25 It is also understood in the art that the 

substitution of like amino acids can be made effectively 
on the basis of hydrophilicity . U.S. Patent 4 , 554 , 101 , 
incorporated herein by reference, states that the 
greatest local average hydrophilicity of a protein, as 

30 governed by the hydrophilicity of its adjacent amino 
acids, correlates with a biological property of the 
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protein . 

As detailed in U.S. Patent 4,554,101, the 
following hydrophilicity values have been assigned to 
amino acid residues: arginine (+3.0); lysine (+3-0); 
5 aspartate (+3.0 + 1); glutarnate (+3.0 + 1); serine 

(+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); 
threonine (44); proline (-0.5 ± 1); alanine (45); 
histidine -0.5); cysteine (-1.0); methionine (-1.3); 
valine 1.5); leucine (-1.8); isoleucine (-1.8); tyrosine 
10 (-2.3); phenylalanine (-2.5); tryptophan (-3.4). 



□ It is understood that an amino acid can be 

substituted for another having a similar hydrophilicity 
;}{ value and still obtain a biologically equivalent and 

rjj immunologically equivalent polypeptide. In such changes, 

: y 15 the substitution of amino acids whose hydrophilicity 

a values are within ±2 is preferred, those that are within 

+1 are particularly preferred, and those within ±0.5 are 

p even more particularly preferred. 

m 

As outlined above, amino acid substitutions are 
20 generally based on the relative similarity of the amino 
acid side-chain substituents , for example, their 
hydrophobicity, hydrophilicity, charge, size, and the 
like. Exemplary substitutions that take various of the 
foregoing characteristics into consideration are well 
25 known to those of skill in the art and include: arginine 
and lysine; glutarnate and aspartate; serine and 
threonine; glutamine and asparagine; and valine, leucine 
and isoleucine. 




73 

Aspects of the invention can be implemented in 
hardware or software, or a combination of both. However, 
preferably, the algorithms and processes of the invention 
are implemented in one or more computer programs 
5 executing on programmable computers each comprising at 
least one processor, at least one data storage system 
(including volatile and non-volatile memory and/or 
storage elements) , at least one input device, and at 
least one output device. Program code is applied to 
10 input data to perform the functions described herein and 
generate output information. The output information is 
applied to one or more output devices, in known fashion. 

?g Each program can be implemented in any desired 

computer language (including machine, assembly, high 

r|j 15 level procedural, or object oriented programming 

languages) to communicate with a computer system. In any 
case, the language can be a compiled or interpreted 

4 language. 

ri 

'SUSS? 

;j| Each such computer program is preferably stored 

M> 20 on a storage medium or device (e.g., ROM, CD-ROM, tape, 

or magnetic diskette) readable by a general or special 
purpose programmable computer, for configuring and 
operating the computer when the storage media or device 
is read by the computer to perform the procedures 
25 described herein. The inventive system can also be 
considered to be implemented as a computer-readable 
storage medium, configured with a computer program, where 
the storage medium so configured causes a computer to 
operate in a specific and predefined manner to perform 
30 the functions described herein. 
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Thus, in another embodiment, the invention 
provides a computer program, stored on a computer- 
readable medium, for generating a target polynucleotide 
sequence. The computer program includes instructions for 
5 causing a computer system to: 1) identify an initiating 
polynucleotide sequence contained in the target 
polynucleotide sequence; 2) parse the target 
polynucleotide sequence into multiply distinct, partially 
complementary, oligonucleotides; and 3) control assembly 

10 of the target polynucleotide sequence by controlling the 
bi-directional extension of the initiating polynucleotide 
sequence by the sequential addition of partially 
complementary oligonucleotides resulting in a contiguous 
double-stranded polynucleotide. The computer program 

15 will contain an algorithm for parsing the sequence of the 
target polynucleotide by generating a set of 
oligonucleotides corresponding to a polypeptide sequence. 
The algorithm utilizes a polypeptide sequence to generate 
a DNA sequence using a specified codon table. The 

20 algorithm then generates a set of parsed oligonucleotides 
corresponding to the (+) and (-) strands of the DNA 
sequence in the following manner: 



# 
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1. The DNA sequence GENE [ ] , an array of bases, is 
generated from the protein sequence AA[], an array of 
amino acids , using a specified codon table. An example 
of the codon table for E. coli type II codons, is listed 
5 below. 

a . parameters 

i . N Length of protein in amino 
acid residues 

ii. L = 3N Length of gene in DNA 
10 bases 

iii . Q Length of each component 
oligonucleotide 

iv. X = Q/2 Length of overlap between 
oligonucleotides 

15 v. W = 3N/Q Number of 

oligonucleotides in the F set 

vi. Z = 3N/Q + 1 Number of 
oligonucleotides in the R set 

vii. F[1:W] set of ( + ) strand 
20 oligonucleotides 

viii. R[L:Z] set of {-) strand 
oligonucleotides 

ix . AA [ 1 : N] array of amino acid 
residues 

25 x. GENE [ 1 : L] array of bases 

comprising the gene 

b. Obtain or design a protein sequence AA [ ] 
consisting of a list of amino acid 
residues. 
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c. Generate the DNA sequence, GENE [ ] , from 

the protein sequence, AA [ ] 

i. For I - 1 to N 

ii. Translate AA[J] from codon table 
5 generating GENE [I: 1+2] 

iii. 1=1+3 

iv. J = J+ 1 

v. Go to ii 

2. Two sets of overlapping oligonucleotides are 
10 generated from GENE [ ] ; F[] covers the (+) 

strand and R[] is a complementary, partially 
overlapping set covering the (-) strand. 

a. Generate the F[] set of oligos 
i . For 1 = 1 to W 

15 ii. F[I] = GENE [I:I+Q-1] 

iii. I = I + Q 

iv. Go to ii 

b . Generate the R set of oligos 
i. J = W 

20 ii. For I = 1 to W 

iii. R[I] = GENE [W:W-Q] 

iv. J - J - Q 

v. Go to iii 

c. Result is two set of oligos F[] and R[] 
25 of Q length 

d. Generate the final two finishing oligos 

i. S [1] = GENE [Q/2: 1] 

ii. S[2] - GENE [L-Q/2:L] 



Subsequently, if desired, oligonucleotide set 
30 assembly can be established by the following algorithm: 
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Two sets of oligonucleotides F[1:W] R[1:Z] S[l:2] 
3. Step 1 

For I = 1 to W 

Anneal F[I], F[I+1], R[I]; place in T[I] 
Anneal F[I+2], R[I+1], R[I+2] T[I+1] 
1 = 1 + 3. 
Go to b 



a . 
b. 
c . 

d. 
e . 

Step 2 
a . 



Do the following until only a single 
reaction remains 

i. For I = 1 to W/3 

ii. Ligate T[I] , T[I+1] 

iii. 1 = 1 + 2 

iv. Go to ii 



f|| 15 CODON TABLE (E . coll Class II preferred usage) 
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PHE 


TTC 


SER 


TCT 


TYR 


TAC 


CYS 


TGG 


TER 


TGA 


TRP 


TGG 


ILE 


ATC 


MET 


ATG 


THR 


ACC 


LEU 


CTG 


PRO 


CCG 


HIS 


CAC 


GLN 


CAG 


ARG 


CGT 


VAL 


GTT 


ALA 


GCG 


ASN 


AAC 


LYS 


AAA 


ASP 


GAC 


GLU 


GAA 


GLY 


GGT 
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Algorithms of the invention useful for assembly 
of a target polynucleotide can further be described as 
Perl script as set forth below, ALGORITHM 1 provides a 
method for converting a protein sequence into a 
5 polynucleotide sequence using E. Coli codons : 

#$sequence is the protein sequence in single letter amino acid code 

#$seqlen is the length of the protein sequence 

#$amino acid is the individual amino acid in the sequence 

#$codon is the individual DNA triplet codon in the Gene sequence 
10 #$DNAsquence is the gene sequence in DNA bases 

#$baselen is the length of the DNA sequence in bases 

$seqlen — length ( $sequence ) ; 

$baselen = $seqlen * 3; 
g!j for ($n = 0; $n <- $seqlen; $n++) 

^ 15 { $aminoacid = substr ($ sequence, $n, 1 ) ; 

i y 
i y 

■V The following list provides the class II codon preference 

\ }i 2. in Perl for E. coli 

h ;;ist . _ . A . . . . . _______ 





if ($aminoacid eq "m") 


{ $codon 


= "ATG"; 


} 






elsif 


( $aminoacid 


eq 


11 f " 1 


{$codon 




"TTC 


20 


elsif 


( $aminoacid 


eq 


fr 1 ,? ) 


{ $codon 




"CTG 




elsif 


( $aminoacid 


eq 


"s") 


{ $codon 




"TCT 




elsif 


($aminoacid 


eq 


"y") 


{ $codon 




" TAC 




elsif 


( $aminoacid 


eq 


"c") 


{ $codon 




"TGC 




elsif 


($aminoacid 


eq 


"w") 


{ $codon 




"TGG 


25 


elsif 


($aminoacid 


eq 


' T i " ) 


{ $codon 




"ATC 




elsif 


( $aminoacid 


eq 


" t " ) 


{ $codon 




"ACC 




elsif 


( $aminoacid 


eq 


M P") 


{ $codon 




"CCG 




elsif 


( $aminoacid 


eq 


"q") 


{ $codon 




"CAG 




elsif 


( $aminoacid 


eq 


" r " ) 


{ $codon 




"CGT 


30 


elsif 


( $ami.noacid 


eq 


"v" ) 


{ $codon 




"GTT 




elsif 


( $aminoacid 


eq 


"a") 


{ $codon 




"GCG 




elsif 


($aminoacid 


eq 


"n") 


{ $codon 




"AAC 




elsif 


( $aminoacid 


eq 


"k") 


{ $codon 




"AAA 




elsif 


( $aminoacid 


eq 


"d") 


{ $codon 




"GAC 


35 


elsif 


( $aminoacid 


eq 


"e") 


{ $codon 




"GAA 
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elsif ($aminoacid eq "g") {$codon - "GGT" ; } 
elsif ($aminoacid eq "h") {$codon = "CAC";} 
else {$codon = ""}; 

5 $DNAsequence = $DNAsequence + $codon; 

ALGORITHM 2 provides a method for parsing a 
polynucleotide sequence into component forward and 
reverse oligonucleotides that can be reassembled into a 
complete target polynucleotide encoding a target 
10 polypeptide: 



#$oligoname 
component #o 
#$OL is the 

15 #$Overlap is 
and each Ire 
#$sequence i 
#$seqlen is 
#$bas is the 

20 #$forseq is 
#$revseq is 
#$revcomp is 
#$oligonameF 
#$oligonameR 



40 



is the identifier name for the list and for each 
ligonucleotide 

length of each component oligonucleotide 

the length of the overlap in bases between each forward 
verse oligonucleotide 
s the DNA sequence in bases 
the length of the DNA sequence in bases 

individual base in a sequence 
the sequence of a forward oligonucleotide 
the sequence of a reverse oligonucleotide 
the reverse complemented sequence of the gene 
[] is the list of parsed forward oligos 
[] is the list of parsed reverse oligos 



25 $Overlap = <STDIN>; 

$seqlen = length ($sequence) ; 

fconvert forward sequence to upper case if lower case 



30 { 
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$forseq = ""; 

for ($j - 0; $j <= seqlen-1; $j ++)' 
$bas = substr ($sequence, $j , 1) ; 
if ($bas eq "a"){$cfor = "A";} 



elsif ($bas eq "t" 

elsif ($bas eq "c" 

elsif ($bas eq "g" 

elsif ($bas eq "A" 

elsif ($bas eq "T" 

elsif ($bas eq "C" 

elsif <$bas eq "G" 
else {$cfor = "X"} 



{$cfor 
{ $cf or 
{ $cf or 
{ $cfor 
{$cfor 
{$cfor 
{$cfor 



,f C" 

iiq ii 

"A" 

H rp n 

"C" 
"G"; 



} 



$forseq = 
print OUT 



$f orseq. $cfor; 
\n"; 



The reverse complement of the sequence generated above is identified 
by: 



4 5 $revcomp = "" ; 



■ w 

ru 

ru 



ill 



10 



20 



30 



80 



for ($i = $seqlen-l; $i >= 0; $i — ) 



{ 



$base = substr ($sequence, $i, 1 ) ; 
if ($base eq "a") {$comp = "T"; } 



elsif 


($base 


eq 


ri -j- n 


) { $comp = 


"A 


elsif 


($base 


eq 


Fig II 


) { $comp = 


"C 


elsif 


($base 


eq 


"c" 


) {$comp - 


"G 


elsif 


($base 


eq 


"A" 


) {$comp = 


ii rp 


elsif 


($base 


eq 


ir rp n 


) {$comp = 


"A 


elsif 


($base 


eq 


"G" 


) { $comp = 


"C 


elsif 


($base 


eq 


"C" 


) { $comp = 


"G 


else 


{$comp = 


= "X 


"}; 







$revcomp = $revcomp . $comp; 



} 



#now do the parsing 
15 ^generate the forward oligo list 



print OUT "Forward oligos\n"; 
print "Forward oligos\n"; 
$r = 1; 

for ($i = 0; $i <= $seqlen -1; $i+=$OL) 
{ $oligo = substr ($sequence, $i, $OL) ; 

print OUT "$oligname F- $r $oligo\n ,, 
print "$oligname F- $r $oligo\n"; 
$r = $r + 1; 



} 



2 5 ^generate the forward reverse list 

$OL; $i >= 0; $i-=$OL) 



$r - 1; 

for ($i = $seqlen - $Overlap 



{ 



35 } 



print OUT "\n"; 
print "\n"; 

$oligo — substr ( $revcomp, $i f $OL) ; 
print OUT "$oligname R- $r $oligo"; 
print "$oligname R- $r $oligo"; 
$r - $r + 1; 



#Rectify and print out the last reverse oligo consisting of 1/2 from 
the beginning # of the reverse complement. 

$oligo ~ substr ( $revcomp, 1 , $Overlap) ; 
print OUT "$oligo\n"; 
40 print "$oligo\n"; 



The invention further provides a computer- 
assisted method for synthesizing a target polynucleotide 
encoding a target polypeptide derived from a model 
sequence using a programmed computer including a 
45 processor, an input device, and an output device, by 

inputting into the programmed computer, through the input 



81 

device, data including at least a portion of the target 
polynucleotide sequence encoding a target polypeptide. 
Subsequently, the sequence of at least one initiating 
polynucleotide present in the target polynucleotide 
sequence is determined and a model for synthesizing the 
target polynucleotide sequence is derived. The model is 
based on the position of the initiating sequence in the 
target polynucleotide sequence using overall sequence 
parameters necessary for expression of the target 
polypeptide in a biological system. The information is 
outputted to an output device which provides the means 
for synthesizing and assembling to target polynucleotide 

It is understood that any apparatus suitable 
for polynucleotide synthesis can be used in the present 
invention. Various non-limiting examples of apparatus, 
components, assemblies and methods are described below. 
For example, in one embodiment, it is contemplated that 
nanodispensing head with up to 16 valves can be used to 
deposit synthesis chemicals in assembly vessels (Figure 
4). Chemicals can be controlled using a syringe pump 
from the reagent reservoir. Because of the speed and 
capability of the ink-jet dispensing system, synthesis 
can be made very small and very rapid. Underlying the 
reaction chambers is a set of assembly vessels linked to 
microchannels that will move fluids by microf luidics . 
The configuration of the channels will pool pairs and 
triplexes of oligonucleotides systematically using, for 
example, a robotic device. However, pooling can be 
accomplished using f luidics and without moving parts. 

As shown in Figure 5, oligonucleotide 
synthesis, oligonucleotide assembly by pooling and 
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annealing, • and ligation can be done using microf luidic 
mixing, resulting in the same set of critical triplex 
intermediates that serves as the substrate for annealing, 
ligation and oligonucleotide joining. DNA ligase and 
5 other components can be placed in the buffer fluid moving 
through the instrument microchambers . Thus, synthesis 
and assembly can be carried out in a highly controlled 
way in the same instrument. 



As shown in Figure 6, the pooling manifold can 
10 be produced from non-porous plastic and designed to 
control sequential pooling of oligonucleotides 
•■^ synthesized in arrays. Oligonucleotide parsing from a 

gene sequence designed in the computer can be programmed 

'til 

| n y for synthesis where ( + ) and (-) strands are placed in 

15 alternating wells of the array. Following synthesis in 

f|j this format , the 12 row sequences of the gene are 

directed into the pooling manifold that systematically 

Q pools three wells into reaction vessels forming the 

critical triplex structure. Following temperature 

111 20 cycling for annealing and ligation, four sets of 

^ triplexes are pooled into 2 sets of 6 oligonucleotide 

products, then 1 set of 12 oligonucleotide products. 
Each row of the synthetic array is associated with a 
similar manifold resulting in the first stage of assembly 
25 of 8 sets of assembled oligonucleotides representing 12 
oligonucleotides each. As shown in Figure 7, the second 
manifold pooling stage is controlled by a single manifold 
that pools the 8 row assemblies into a single complete 
assembly. Passage of the oligonucleotide components 
30 through the two manifold assemblies (the first 8 and the 
second single) results in the complete assembly of all 96 
oligonucleotides from the array. The assembly module 
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(Figure 8) of Genewriter™ can include a complete set of 7 
pooling manifolds produced using microf abrication in a 
single plastic block that sits below the synthesis 
vessels. Various configurations of the pooling manifold 
5 will allow assembly of 96,384 or 1536 well arrays of 
parsed component oligonucleotides. 



The initial configuration is designed for the 
assembly of 96 oligonucleotides synthesized in a pre- 
defined array, composed of 48 pairs of overlapping 50 

10 mers. Passage through the assembly device in the 

presence of DNA ligase and other appropriate buffer and 
chemical components, and with appropriate temperature 
controls on the device, will assembly these into a single 
2400base double stranded gene assembly (Figure 9) . 

15 The basic pooling device design can be made of Plexiglas™ 
or other type of co-polymer with microgrooves or 
microf luidic channels etched into the surface and with a 
temperature control element such as a Peltier circuit 
underlying the junction of the channels. This results in 

20 a microreaction vessel at the junction of two channels 
for 1) mixing of the two streams, 2) controlled 
temperature maintenance or cycling a the site of the 
junction and 3) expulsion of the ligated mixture from the 
exit channel into the next set of pooling and ligation 

25 chambers . 



As shown in Figure 11, the assembly platform 
design can consist of 8 synthesis microwell plates in a 
96 well configuration, addressed with 16 channels of 
microdispensing . Below each plate is: 1) an evacuation 
30 manifold for removing synthesis components; and 2) an 

assembly manifold based on the schematic in Figure 9 for 
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assembling component oligonucleotides from each 96-well 
array. Figure 12 shows a higher capacity assembly format 
using 1536-well microplates and capable of synthesis of 
1536 component oligonucleotides per plate. Below each 
5 plate is: 1) an evacuation manifold for removing 
synthesis components; and 2) an assembly manifold 
assembly for assembling 1536 component oligonucleotides 
from each 1536-well array. Pooling and assembly 
strategies can be based on the concepts used for 96-well 
10 plates. 



An alternative assembly format includes using 

□ surface-bound oligonucleotide synthesis rather than 
soluble synthesis on CPG glass beads (Figure 13) . In 

nil this configuration, oligonucleotides are synthesized with 

15 a hydrocarbon linker that allows attachment to a solid 
ill support. Following parsing of component sequences and 

synthesis, the synthesized oligonucleotides are 
O covalently attached to a solid support such that the 

□ stabilizer is attached and the two ligation substrates 
jll 20 added to the overlying solution. Ligation occurs as 

i*i mediated by DNA ligase in the solution and increasing 

temperature above the Tm removes the linked 
oligonucleotides by thermal melting. As shown in Figure 
14 the systematic assembly on a solid support of a set of 

25 parsed component oligonucleotides can be arranged in an 
array with the set of stabilizer oligonucletoide 
attached. The set of ligation substrate oligonucleotides 
are placed in the solution and, systematic assembly is 
carried out in the solid phase by sequential annealing, 

30 ligation and melting which moves the growing DNA 
molecules across the membrane surface. 



Figure 15 shows an additional alternative means 
for oligonucleotide assembly, by binding the component 
oligonucleotides to a set of metal electrodes on a 
microelectronic chip, where each electrode can be 
5 controlled independently with respect to current and 
voltage. The array contains the set of minus strand 
oligonucleotides. Placing a positive change on the 
electrode will move by electrophoresis the component 
ligase substrate oligonucleotide onto the surface where 
10 annealing takes place. The presence of DNA ligase 

mediates covalent joining or ligation of the components. 
The electrode is then turned off or a negative charge is 
applied and the DNA molecule expulsed from the electrode. 
The next array element containing the next stabilizer 
15 oligonucleotide from the parsed set is turned on with a 
positive charge and a second annealing, joining and 
I| ligation with the next oligonucleotide in the set carried 

out. Systematic and repetitive application of voltage 
control, annealing, ligation and denaturation will result 
20 in the movement of the growing chain across the surface 
as well as assembly of the components into a complete DNA 
molecule. 

The invention further provides methods for the 
automated synthesis of target polynucleotides. For 
25 example, a desired sequence can be ordered by any means 
of communication available to a user wishing to order 
such a sequence. A "user", as used herein, is any entity 
capable of communicating a desired polynucleotide 
sequence to a server. The sequence may be transmitted by 
30 any means of communication available to the user and 

receivable by a server. The user can be provided with a 
unique designation such that the user can obtain 
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information regarding the synthesis of the polynucleotide 
during synthesis. Once obtained, the transmitted target 
polynucleotide sequence can be synthesized by any method 
set forth in the present invention. 

5 The invention further provides a method for 

automated synthesis of a polynucleotide, by providing a 
user with a mechanism for communicating a model 
polynucleotide sequence and optionally providing the user 
with an opportunity to communicate at least one desired 

10 modification to the model sequence. The invention 

envisions a user providing a model sequence and a desired 
modification to that sequence which results in the 
alteration of the model sequence. Any modification that 
alters the expression, function or activity of a target 

15 polynucleotide or encoded target polypeptide can be 
communicated by the user such that a modified 
polynucleotide or polypeptide is synthesized or expressed 
according to a method of the invention. For example, a 
model polynucleotide encoding a polypeptide normally 

20 expressed in a eukaryotic system can be altered such that 
the codons of the resulting target polynucleotide are 
conducive for expression of the polypeptide in a 
prokaryotic system. In addition, the user can indicate a 
desired modified activity of a polypeptide encoded by a 

25 model polynucleotide. Once provided, the algorithms and 
methods of the present invention can be used to 
synthesize a target polynucleotide encoding a target 
polypeptide believed to have the desired modified 
activity. The methods of the invention can be further 

30 utilized to express the target polypeptide and to screen 
for the desired activity. It is understood that the 
methods of the invention provide a means for synthetic 
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evolution whereby any parameter of polynucleotide 
expression and/or polypeptide activity can be altered as 
desired. 

Once the transmitted model sequence and desired 
modification are provided by the user, the data including 
at least a portion of the model polynucleotide sequence 
is inputted into a programmed computer, through an input 
device. Once inputted, the algorithms of the invention 
are used to determine the sequence of the model 
polynucleotide sequence containing the desired 
modification and resulting in a target polynucleotide 
containing the modification. Subsequently, the processor 
and algorithms of the invention is used to identify at 
least one initiating polynucleotide sequence present in 
the polynucleotide sequence. A target polynucleotide 
(i.e., a modified model polynucleotide) is identified and 
synthesized. 

EXAMPLES 

Nucleic Acid Synthesis Design Protocol 

For the purposes of assembling a synthetic 
nucleic acid sequence encoding a target polypeptide, a 
model polypeptide sequence or nucleic acid sequence is 
obtained and analyzed using a suitable DNA analysis 
package, such as, for example, MacVector or DNA Star. If 
the target protein will be expressed in a bacterial 
system, for example, the model sequence can be converted 
to a sequence encoding a polypeptide utilizing E. coli 
preferred codons (i.e., Type I, Type II or Type II codon 
preference) . The present invention provides the 
conversion programs Codon I, Codon II or Codon III. 
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However, a nucleic acid sequence of the invention can be 
designed to accommodate any codon preference of any 
prokaryotic or eukaryotic organism. 



In addition to the above codon preferences, 
5 specific promoter, enhancer, replication or drug 
resistance sequences can be included in a synthetic 
nucleic acid sequence of the invention. The length of 
the construction can be adjusted by padding to give a 
round number of bases based on about 25 to 100 bp 

10 synthesis. The synthesis of sequences of about 25 to 100 
bp in length can be manufactured and assembled using the 
array synthesizer system and may be used without further 
purification. For example, two 96-well plates containing 
100-mers could give a 9600 bp construction of a target 

15 sequence. 



Subsequent to the design of the 
oligonucleotides needed for assembly of the target 
sequence, the oligonucleotides are parsed using 
ParseOligo™, a proprietary computer program that 
optimizes nucleic acid sequence assembly. Optional steps 
in sequence assembly include identifying and eliminating 
sequences that may give rise to hairpins, repeats or 
other difficult sequences. The parsed oligonucleotide 
list is transferred to the Synthesizer driver software. 
The individual oligonucleotides are pasted into the wells 
and oligonucleotide synthesis is accomplished. 

The ParseOligo program reads a DNA sequence 
from a file and parses it into two sets of 
oligonucleotides, one set forward and one reverse, for 
30 synthetic gene assembly. 



20 
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The input file format is as follows: 

#Should be all text file 

tSequence name on the first line followed by paragraph mark 
#The entire DNA sequences should be next without spaces or paragraph 
5 #marks. 
# 

#The DNA sequence can be upper or lower case and lower case will be 

# converted. 

# 

10 #Any base other than G,A, T or C will be converted to X. 
# 
# 
# 
# 

15 # 



■: 0 
■instil. 
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25 



30 



35 



40 



print " PARSEOLIGOS N\n"; 

print " 19 9 9 \n"; 

print "\n"; 
print "\n"; 

print " Parse - A program for parsing a DNA sequence\n"; 

print " into component oligonucleotides for synthetic gene 
assembly . \n" ; 
print "\n"; 
print "\n" ; 

print " written by Glen A. Evans copyright c 1999. \n"; 

print "\n"; 
print ,T \n"; 
print "\n"; 

print "Enter name of the input DNA sequence file: "; 
$a = <STDIN>; 
chomp $ a ; 
print "\n"; 
print "\n" ; 

print "Enter the name of the output DNA oligonucleotide file: "; 
$b = <STDIN>; 
chomp $b; 
print "\n"; 
print "\n"; 
open (IN, $a) | | 
open (OUT, ">$b") 



die "cannot open $a for reading: $!"; 
II die "cannot create $b : $!"; 
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print "\n"; 
print "\n"; 

print "Enter the name for the oligo lists: 
$oligname = <STDIN>; 
chomp $oligname; 
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print "\n"; 
print "\n" ; 

print "Enter the length of oligonucleotides: 
$OL - <STDIN>; 
chomp $OL; 
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print "\n"; 
print "\n" ; 

print "Enter the required overlap: 
$Overlap = <STD1N>; 
chomp $Overlap; 
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#This is heart of the program - The rest is I/O. 
$ sequence = ""; 



while 



} 



(<IN>) 

{ $sequence 



<IN>; 



chomp $ sequencer- 



print OUT "The input DNA sequence is: \n"; 

print OUT "\n"; 

10 print OUT "$sequence"; 

print OUT "\n"; 

print OUT "\n"; 

print "The input DNA sequence is: \n"; 
print "\n"; 
15 print "$sequence"; 
print "\n"; 
print "\n"; 



$seqlen = length ( $sequence ) ; 

print OUT "The sequence is $seqlen bases long \n"; 
2 0 print OUT "\n"; 
print OUT "\n"; 

print "The sequence is $seqlen bases long \n"; 
print "\n"; 
print "\n"; 

2 5 #convert forward sequence to upper case if lower case 

print OUT "The forward sequence converted to upper case \n" 
print OUT "\n"; 
print OUT "\n"; 
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45 



print "The forward sequence converted to upper case \n"; 
print "\n"; 
print "\n"; 



$forseq = ""; 

for ($j = 0; $j <= seqlen-1; $ 
{ $bas - substr ( $sequence , 

if ($bas eq "a") { $cf or 

elsif 

elsif 

elsif 

elsif 

elsif 

elsif 



($bas eq "t" 
($bas eq "c" 
($bas eq "g" 
($bas eq "A" 
($bas eq "T" 
($bas eq "C" 
elsif ($bas eq "G" 
else {$cfor - "X"} 
$forseq = $f orseq. $cf or ; 
print OUT "$j \n"; 



j ++) 

$j,D ; 

"A"; } 

) {$cfor 
) {$cfor 
) {$cfor 
) {$cfor 
) {$cfor 
) { $cf or 
) {$cfor 



?l rp Tl 
II C H 

" G " 
"A" 

it iji n 

"C" 

HQ II 



print OUT "$f orseq"; 
print OUT "\n"; 
print OUT "\n"; 
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10 



print 
print 
print 



'$forseq"; 

T \n"; 

'\n"; 



freverse complement the sequence 

print OUT "The reverse complement of the DNA sequence : \n" 
print OUT "\n"; 
print OUT "\n"; 

print "The reverse complement of the DNA sequence : \n" ; 
print "\n"; 
print "\n"; 



15 



20 



$revcomp = 
for ($i = $seqlen-l; $i >= 



0; $i- 



$base = substr ($sequence, $ 
if ($base eq "a") {$comp = 
elsif ($base eq "t") 
elsif ($base eq "g") 
elsif ($base eq 
elsif ($base eq 
elsif ( $base eq 
elsif ($base eq 
elsif ($base eq 
else {$comp - "X"} ; 
$revcomp = $revcomp . $comp; 



"c") 
"A" ) 
" T " ) 
"G") 
"C") 



i,D ; 

"T"; } 

{ $comp 
{ $comp 
{ $comp 
{ $comp 
{ $comp 
{ $comp 
{ $comp 



,f A" ; 
"C"; 
"G"; 
"T"; 
"A"; 
"C"; 
r, G"; 



t -si'. 



} 

25 print OUT "$revcomp\n" ; 
print OUT "\n"; 

print "$revcomp\n"; 
print "\n"; 

#now do the parsing 
30 #generate the oligo list 

print OUT "Forward oligos\n"; 
print "Forward oligos\n"; 
$r = 1; 

for ($i - 0; $i <= $seqlen -1; $i+=$OL) 
35 { $oligo = substr ($sequence, $i, $OL) ; 

print OUT "$oligname F- $r $oligo\n"; 
print "$oligname F- $r $oligo\n"; 
$r = $r + 1; 



} 



4 0 print OUT "\n"; 

print OUT "Reverse oligos\n"; 
print "Reverse oligos\n"; 



45 { 



$r = 1; 
for ($i 



$seqlen - $Overlap - $OL; $i >= 0; '$i-=$OL) 



print OUT "\n"; 
print "\n"; 
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$oligo = substr ($revcomp, $i, $0L) ; 
print OUT "$oligname R- $r $oligo"; 
print "$oligname R- $r $oligo"; 
$r = $r + 1; 

5 } 

#Rectify and print out the last reverse oligo consisting of 1/2 from 
the beginning # of the reverse complement. 

$oligo = substr ($revcomp, 1 , $Overlap) ; 
print OUT "$oligo\n"; 
10 print "$oligo\n"; 

#close files and exit 

close (IN) II die "can't close $a:$!"; 
close (OUT) |j die "can't close $b:$!"; 

print "\n"; 
15 print "\n"; 

print "Processing completed . \n" ; 

print "\n"; 
print "\n"; 

print "Have a nice day!\n"; 
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Assembly of Parsed Oligonucleotides Using a Two-Step PCR 
Reaction: 

Obtain arrayed sets of parsed overlapping 
oligonucleotides, 50 bases each, with an overlap of about 
5 25 base pairs (bp) . The oligonucleotide concentration is 
from 250 nM (250 pM/ml) . 50 base oligos give T m s from 75 
to 85 degrees C, 6 to 10 od 260 , H to 15 nanomoles, 150 to 
300 ug. Resuspend in 50 to 100 ]il of H 2 0 to make 250 
nM/ml. Combine equal amounts of each oligonucleotide to 
10 final concentration of 250 jiM (250 nM/ml) . Add 1 ul of 
each to give 192 ul . Add 8 ]il dH 2 0 to bring up to 200 
2 ]al. Final concentration is 250 tjlM mixed oligos. Dilute 

|| 250-fold by taking 10 ul of mixed oligos and add to 1 ml 

If* of water. (1/100; 2 . 5 mM ) then take 1 jil of this and 

II 15 add to 24 ul IX PCR mix. The PCR reaction includes: 

10 mM TRTS-HC1, pH 9.0 
2.2. mM MgCl 2 
| 50 mM KC1 

I 0,2 mM each dNTP 

I 20 0.1% Triton X-100 

^ One U TaqI polymerase is added to the reaction. The 

reaction is thermoycled under the following conditions 
a. Assembly 

i. 55 cycles of 

25 1. 94 degrees 30 s 

2. 52 degrees 30s 

3. 72 degrees 30s 

Following assembly amplification, take 2.5 ul of this 
assembly mix and add to 100 ul of PCR mix. (40X 
30 dilution) . Prepare outside primers by taking 1 ul of Fl 
(forward primer) and 1 ul of R96 (reverse primer) at 250 
UM (250 nm/ml - .250 nmole/uD and add to the 100 ul PCR 
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reaction. This gives a final concentration of 2.5 \M each 
oligo. Acid 1 U Taql polymerase and thermocycle under the 
following conditions : 

35 cycles (or original protocol 23 cycles) 
5 94 degrees 30s 

50 degrees 30s 
72 degrees 60s 
Extract with phenol/chloroform. Precipitate with 
ethanol. Resuspend in 10 jil of dH 2 0 and analyze on an 
10 agarose gel. 

Assembly of Parsed Oligonucleotides Using Taql Ligation 

Arrayed sets of parsed overlapping 
oligonucleotides of about 25 to 150 bases in length each, 
with an overlap of about 12 to 75 base pairs (bp) , are 
15 obtained. The oligonucleotide concentration is from 250 
nM (250 uM/ml) . For example, 50 base oligos give T m s 
from 75 to 85 degrees C, 6 to 10 od 260 , 1 1 to 15 
nanomoles, 150 to 300 ug. Resuspend in 50 to 100 ml of 
H 2 0 to make 250 nM/ml . 

20 Using a robotic workstation, equal amounts of 

forward and reverse oligos are combined pairwise. Take 
10 ul of forward and 10 \il of reverse oligo and mix in a 
new 96-well v-bottom plate. This gives one array with 
sets of duplex oligonucleotides at 250 y.M, according to 

25 pooling scheme Step 1 in Table 1. Prepare an assembly 
plate by taking 2 ul of each oligomer pair and adding to 
a fresh plate containing 100 ul of ligation mix in each 
well. This gives an effective concentration of 2.5 uM or 
2.5 nM/ml. Transfer 20 ul of each well to a fresh 

30 microwell plate and add 1 ill of T4 polynucleotide kinase 
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and 1 ial of 1 mM ATP to each well. Each reaction will 
have 50 pmoles of oligonucleotide and 1 nmole ATP. 
Incubate at 37 degrees C for 30 minutes. 

Initiate assembly according to Steps 2-7 of 
5 Table 1. Carry out pooling Step 2 mixing each successive 
well with the next. Add 1 ul of Taql ligase to each 
mixed well. Cycle once at 94 degrees for 30 sec; 52 
degrees for 30s; then 72 degrees for 10 minutes. 

Carry out step 3 (Table 1) of pooling scheme 
10 and cycle according to the temperature scheme above. 

Carry out steps 4 and 5 of the pooling scheme and cycle 
according to the temperature scheme above. Carry out 
pooling scheme step 6 and take 10 ul of each mix into a 
fresh microwell. Carry out step 7 pooling scheme by 
15 pooling the remaining three wells. Reaction volumes will 
be: 

Initial plate has 20 ul per well. 
Step 2 20 ul + 20 ul = 40 ul 
Step 3 80 ul 

20 Step 4 160 ul 

Step 5 230 ul 

Step 6 10 ul + lOul = 20 ul 

Step 7 20+20+20-60 ul final reaction 
volume 

25 A final PCR amplification was then performed by 

taking 2 ul of final ligation mix and add to 20 ul of PCR 
mix containing 10 mM TRIS-HCl, pH 9.0, 2.2 raM MgCl 2 , 50 
mM KC1, 0.2 mM each dNTP and 0.1% Triton X-100 
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Prepare outside primers by taking 1 y.1 of Fl 
(forward primer) and 1 ]il of R96 (reverse primer) at 250 
ViM (250 nm/ml - .250 nmole/p.1) and add to the 100 \il PCR 
reaction giving a final concentration of 2.5 uM each 
5 oligo. Add 1 U Taql polymerase and cycle for 35 cycles 
under the following conditions: 94 degrees for 30s; 50 
degrees for 30s; and 72 degrees for 60s. Extract the 
mixture with phenol/chloroform. Precipitate with 
ethanol. Resuspend in 10 \il of dH 2 0 and analyze on an 
10 agarose gel. 
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Table 1. Pooling scheme for ligation assembly. 
Ligation method - Well pooling scheme 



FROM 


TO 


STEP 


FROM 


TO 


All F 


All R 


3 


A2 


A4 








A6 


A8 


Al 


A2 




A10 


A12 


A3 


A4 




B2 


B4 


A5 


A6 




B6 


B8 


A7 


A8 




BIO 


B12 


A9 


A10 




C2 


C4 


All 


A12 




C6 
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Assembly of Parsed Oligonucleotides Using Taq I 
Synthesis and Assembly 

Arrayed sets of parsed overlapping 
oligonucleotides of about 25 to 150 bases in length each, 
5 with an overlap of about 12 to 75 base pairs (bp) , are 
obtained. The oligonucleotide concentration is from 250 
nM (250 uM/ml) . 50 base oligos give T m s from 75 to 85 
degrees C, 6 to 10 od 260/ 11 to 15 nanomoles, 150 to 300 
pg. Resuspend in 50 to 100 ml of H 2 0 to make 250 nM/ml . 

10 The invention envisions using a robotic 

workstation to accomplish nucleic acid assembly. In the 
present example, two working plates containing forward 
and reverse oligonucleotides in a PGR mix at 2.5 mM are 
prepared and 1 y.1 of each oligo are added to 100 jal of 

15 PCR mix in a fresh microwell providing one plate of 

forward and one of reverse oligos in an array. Cycling 
assembly is then initiated as follows according to the 
pooling scheme outlined in Table 1. In the present 
example, 96 cycles of assembly can be accomplished 

20 according to this scheme. 



Remove 2 tjlI of well F-El to a fresh well; 
remove 2 p.1 of R-El to a fresh well; add 18 ]il of IX PCR 
mix; add 1 U of Taql polymerase; 
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Cycle once: 94 degrees 30 s 

52 degrees 30 s 

72 degrees 30 s 
Subsequently, remove 2 y.1 of well F-E2 to the reaction 
5 vessel; remove 2 ul of well R-D12 to the reaction vessel. 
Cycle once according to the temperatures above. Repeat 
the pooling and cycling according to the scheme outlined 
in Table 1 for about 96 cycles. 

A PCR amplification is then performed by taking 
10 2 yil of final reaction mix and adding it to 20 \xl of a 
PCR mix comprising: 

10 mM TRIS-HC1, pH 9.0 

2.2 mM MgCl2 

50 mM KC1 
15 0.2 mM each dNTP 

0 . 1% Triton X-100 
Outside primers are prepared by taking 1 \il of Fl and 1 
ml of R96 at 250 mM (250 nm/ml - .250 nmole/ml) and add 
to the 100 \il PCR reaction. This gives a final 
20 concentration of 2.5 pM each oligo. 1 U Taql polymerase 
is subsequently added and the reaction is cycled for 
about 23 to 35 cycles under the following conditions: 

94 degrees 30s 

50 degrees 30s 
25 72 degrees 60s 

The reaction is subsequently extracted with 
phenol/chloroform, precipitated with ethanol and 
resuspend in 10 ml of dH20 for analysis on an agarose 
gel . 



□ 
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Equal amounts of forward and reverse oligos 
pairwise are added by taking 10 ]il of forward and 10 ul 
of reverse oligo and mix in a new 96-well v-bottom plate. 
This provides one array with sets of duplex 
5 oligonucleotides at 250 mM, according to pooling scheme 
Step 1 in Table 1. An assembly plate was prepared by 
taking 2 ul of each oligomer pair and adding them to the 
plate containing 100 ' ]il of ligation mix in each well. 
This gives an effective concentration of 2.5 ]M or 2.5 

10 nM/ml. About 20 ul of each well is transferred to a 
fresh microwell plate in addition to 1 yl of T4 
polynucleotide kinase and 1 pi of 1 mM ATP. Each 
reaction will have 50 pmoles of oligonucleotide and 1 
nmole ATP. Incubate at 37 degrees for 30 minutes. 

15 Nucleic acid assembly was initiated according to Steps 2- 
7 of Table 1. Step 2 pooling is carried out by mixing 
each well with the next well in succession. 1 \il of Taql 
ligase to is added to each mixed well and cycled once as 
follows: 

20 94 degrees 30 sec 

52 degrees 30s 
72 degrees 10 minutes 
Step 3 of pooling scheme is carried out and cycled 
according to the temperature scheme above. Steps 4 and 5 
25 of the pooling scheme are carried out and cycled 

according to the temperature scheme above. Carry out 
pooling scheme step 6 and take 10 ul of each mix into a 
fresh microwell. Step 7 pooling scheme is carried out by 
pooling the remaining three wells. The reaction volumes 
30 will be (initial plate has 20 ul P er well) : 
Step 2 20 ul + 20 ul = 40 ul 
Step 3 80 ul 

Step 4 160 ul 
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Step 5 230 ul 

Step 6 10 pi + 10 ul = 20 viml 

Step 7 20+20+20-60 ul final reaction 

volume 



5 A final PCR amplification is performed by taking 2 ul of 
the final ligation mix and adding it to 20 ul of PCR mix 
comprising : 

10 mM TRIS-HC1, pH 9.0 
2.2 inM MgC12 
10 50 mM KC1 

0 . 2 mM each dNTP 
0.1% Triton X-100 

Outside primers are prepared by taking 1 ul of 
Fl and 1 ul of R96 at 250 mM (250 nm/ml - .250 nmole/ml) 
15 and adding them to the 100 ul p CR reaction giving a final 
concentration of 2.5 uM for each oligo. Subsequentlly, 1 
U of Taql polymerase is added and cycled for about 23 to 
35 cycles under the following conditions: 

94 degrees 30s 
20 50 degrees 30s 

72 degrees 60s 
The product is extracted with phenol/chloroform, 
precipitate with ethanol, resuspend in 10 ul of dH20 and 
analyzed on an agarose gel. 
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Table 2. Pooling scheme for assembly using 
Taql polymerase (also topoisomerase II) . 

Step Forward oligo Reverse oligo 

1 F E 1 + RE 1 Pause 

2 FE 2 + RD12 Pause 

3 F E 3 + R D 11 Pause 

4 F E 4 + R D 10 Pause 

5 F E 5 + R D 9 Pause 

6 F E 6 + R D 8 Pause 

7 FE 7 + RD 7 Pause 

8 FE 8 + RD 6 Pause 

9 F E 9 + R D 5 Pause 

10 F E 10 + R D 4 Pause 

11 F E 11 + R D 3 Pause 

12 F E 12 + R D 2 Pause 

13 F F 1 + R D 1 Pause 

14 F F 2 + R C 12 Pause 

15 F F 3 + R C 11 Pause 

16 F F 4 + R C 10 Pause 

17 FF 5 + RC 9 Pause 

18 F F 6 + R C 8 Pause 

19 F F 7 + R C 7 Pause 

20 F F 8 + R C 6 Pause 

21 FF 9 + RC 5 Pause 



22 F F 10 + R C 4 Pause 

23 F F 11 + R C 3 Pause 
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24 F F 12 + R C 2 Pause 

25 F G 1 + R C 1 Pause 

26 F G 2 + R B . 12 Pause 

27 F G 3 + R B 11 Pause 

28 F G 4 + R B 10 Pause 

29 FG 5 + RB 9 Pause 

30 FG 6 + RB 8 Pause 

31 FG 7 + RB 7 Pause 

32 F G 8 + R B 6 Pause 

33 F G 9 + R B 5 Pause 

34 F G 10 + R B 4 Pause 

35 F G 11 + R B 3 Pause 

36 F G 12 + R B 2 Pause 

37 F H 1 . + R B 1 Pause 

38 F H 2 + R A 12 Pause 

39 F H 3 + R A 11 Pause 

40 F H 4 + R A 10 Pause 

41 F H 5 + R A 9 Pause 

42 F H 6 + R A 8 Pause 

43 F H 7 + R A 7 Pause 

44 F H 8 + R A 6 Pause 

45 F H 9 + R A 5 Pause 

46 F H 10 + R A 4 Pause 

47 F H 11 + R A 3 Pause 

48 F H 12 + R A 2 Pause 




Table 3. Alternate 
assembly from the 5' or 
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Assembly of Nucleic Acid. Molecules 



m 



The nucleic acid molecules listed in Table 4 
have been produced using the methods described herein. 
The features and characteristics of each nucleic acid 
molecule is also described in Table 4. 



25 As described in Table A, a synthetic plasmid of 

4800 bp in length was assembled. The plasmid comprises 
192 oligonucleotides (two sets of 96 overlapping 50 mers; 
25 bp overlap) . The plasmid is essentially pUC 
containing kanamycin resistance instead of ampicillin 

30 resistance. The synthetic plasmid also contains lux A 
and B genes from the Vibrio fisheri bacterial luciferase 
gene. The SynPucl9 plasmid is 2700 bp in length 
comprising a sequence essentially identical to pUC19 only 
shortened to precisely 2700 bp. Two sets of 96 50 mers 
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were used to assemble the plasmid. The Synlux4 pUC19 
plasmid was shortened and luxA gene was added. 54 100- 
mer oligonucleotides comprising two sets of 27 
oligonucleotides were used to assemble the plasmid. The 
5 miniQElO plasmid comprising 2400 bp was assembled using 
48 50 mer oligonucleotides. MiniQElO is an expression 
plasmid containing a 6X His tag and bacterial promoter 
for high-level polypeptide expression. MiniQElO was 
assembled and synthesized using the Taql polymerase 

10 amplification method of the invention. The microQE 
plasmid is a minimal plasmid containing only an 
ampicillin gene, an origin of replication and a linker of 
pQE plasmids. MicroQE was assembled using either 
combinatoric ligation with 24 50-mers or with one tube 

15 PCR amplification. The SynFibl, SynEibB and SynFibG 
nucleic acid sequences are synthetic human fibrinogens 
manufactured using E. coli codons to optimize expression 
in a prokaryotic expression system. 



Table 4. 


Synthetic 


nucleic 


acid molecules 




produced using 


the methods 


of the 


invention . 




Synthetic Plasmid 4800 


192 
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F01-F96 


SynLux/4 
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MicroQE 
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PQE2 5 


2400 


96 
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circular 


F1-F48 


SynFibB 


1500 


60 


59 


50mers linear FibbFl-30 








1 25mer 




SynFibG 


1350 


54 


53 


50mers linear FibgFl-27 








1 25mer 
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It is to be understood that while the invention 
has been described in conjunction with the detailed 
description thereof, the foregoing description is 
intended to illustrate and not limit the scope of the 
5 invention, which is defined by the scope of the appended 
claims. Other aspects, advantages, and modifications are 
within the scope of the following claims. 
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